MLOps / Infra Jobs

ML infrastructure and operations roles — model serving, training pipelines, monitoring, deployment.

50 open positions

Storage Architect

Graphcore

Austin, Texas, United States; US - Milpitassenior19h ago

Graphcore seeks a Storage Architect to design and optimize high-performance storage systems for AI data centers, specializing in NVMe SSDs, PCIe topologies, and Linux kernel tuning. The role focuses on eliminating I/O bottlenecks for GPU training and inference workloads, including GPU-direct storage optimization and telemetry-driven system design. This is a critical infrastructure position requiring deep expertise in storage hardware and software optimization at massive scale.

MLOPS INFRASYSTEM DESIGN

Staff UEFI Engineer

Graphcore

Austin, Texas, United States; US - Milpitasstaff19h ago

Graphcore seeks an experienced Staff UEFI Engineer to design and deploy firmware for AI server platforms, focusing on system initialization and hardware configuration in large-scale data center environments. This role combines deep firmware expertise with infrastructure engineering to support next-generation AI compute hardware.

SYSTEM DESIGNMLOPS INFRA

Staff System Software Engineer

Graphcore

Bristol, UKstaff19h ago

Graphcore seeks a Staff System Software Engineer to design, implement, and test low-level kernel drivers and user-space driver libraries as part of their system software team. This role focuses on building the critical driver infrastructure for Graphcore's AI compute hardware stack. The position requires deep expertise in systems programming and hardware-software integration to support their datacenter-scale AI compute platform.

SYSTEM DESIGNMLOPS INFRA

Staff Microcontroller Firmware Developer

Graphcore

Austin, Texas, United States; US - Milpitasstaff19h ago

Graphcore seeks a Staff Microcontroller Firmware Developer to design and implement firmware for microcontroller-based management systems in AI server and rack-scale platforms. The role focuses on Zephyr RTOS-based firmware development and low-level device drivers for real-time embedded systems supporting hyperscale data center infrastructure. This is a senior-level engineering position requiring deep expertise in embedded firmware development and hardware-software integration.

SYSTEM DESIGNMLOPS INFRA

Staff Machine Learning Engineer (Large Systems)

Graphcore

Cambridge, UKstaff19h ago

This is a Staff Machine Learning Engineer role at Graphcore focused on developing and optimizing AI models for specialized hardware at scale. The position involves implementing cutting-edge ML models, optimizing performance across thousands of accelerators, and collaborating with software and research teams to advance AI compute technology. Candidates should have strong expertise in distributed training, model optimization, and large-scale system implementation.

MACHINE LEARNINGSYSTEM DESIGNMLOPS INFRA

Staff Firmware Validation Engineer

Graphcore

US - Milpitasstaff19h ago

Graphcore seeks a Staff Firmware Validation Engineer to ensure quality and reliability of firmware across ARM-based AI server platforms, including SoC firmware (UEFI), OpenBMC, and rack management systems. This role is critical for validating the infrastructure powering hyperscale AI deployments and requires deep expertise in firmware testing and system-level validation.

SYSTEM DESIGNMLOPS INFRA

Staff Bring-Up and Characterisation Engineer

Graphcore

Austin, Texas, United Statesstaff19h ago

Graphcore seeks a Staff Bring-Up and Characterisation Engineer to lead the bring-up and validation of cutting-edge AI silicon processors and high-performance blade systems at their new Austin AI Engineering Campus. This role requires deep expertise in hardware characterization, silicon validation, and cross-functional coordination with architecture, silicon engineering, and product teams. The position offers a competitive salary of $198,100-$268,000 plus phantom equity, targeting an experienced engineer capable of managing complex hardware platform launches.

SYSTEM DESIGNMLOPS INFRA

Software Infrastructure Kubernetes Engineer

Graphcore

Bristol, UK; Cambridge, UKmid19h ago

Graphcore seeks a Kubernetes-focused Software Infrastructure engineer to develop and manage critical platforms supporting their AI compute teams. The role involves building CI/CD pipelines, deployment systems, and infrastructure services for machine learning software components on HPC platforms. You'll work in cross-functional squads to eliminate operational toil and deliver long-term engineering solutions.

MLOPS INFRASYSTEM DESIGN

Software Engineer in Build Engineering

Graphcore

Bristol, UK; Cambridge, UKmid19h ago

Join Graphcore's new Build Engineering team to develop critical infrastructure tools that power their ML software stack. You'll optimize build, test, and deployment processes for high-performance AI platforms while working with distributed systems and collaborating closely with QA and development teams.

MLOPS INFRASYSTEM DESIGN

Software Engineer

Graphcore

Gdańsk, Pomeranian Voivodeship, Polandmid19h ago

Graphcore seeks a Software Engineer to develop their Collectives Communication Library for AI hardware accelerators, focusing on high-bandwidth, low-latency distributed computing primitives. The role involves designing and implementing complex software systems that integrate custom hardware with existing AI ecosystems, requiring strong systems programming and distributed systems expertise.

SYSTEM DESIGNMLOPS INFRAMACHINE LEARNING

Senior Technical Program Manager

Graphcore

Bristol, UK; Cambridge, UK; London, UKsenior19h ago

Senior Technical Program Manager role at Graphcore focused on bridging technical and management domains for AI compute infrastructure. Responsibilities include liaising between technical teams, managing project schedules, and ensuring delivery of integrated hardware-software systems across workload management, systems management, and observability platforms. Requires deep technical expertise combined with program management excellence to coordinate multi-functional teams in a complex semiconductor and AI infrastructure context.

MLOPS INFRASYSTEM DESIGN

Senior Staff ML Engineer

Graphcore

Bristol, UKstaff19h ago

Senior Staff ML Engineer role at Graphcore focused on testing, validating, and benchmarking a complex ML software stack across AI accelerator hardware. The ideal candidate has deep experience with ML frameworks, model training/execution, and debugging functional and performance issues while collaborating across software and hardware teams.

MACHINE LEARNINGMLOPS INFRAEVALS

Senior Staff ML Engineer

Graphcore

Cambridge, UKstaff19h ago

Senior Staff ML Engineer role at Graphcore focused on testing, validating, and benchmarking complex ML software stacks across AI accelerator hardware. The position requires deep expertise in ML systems, hands-on debugging of functional and performance issues, and close collaboration with software and hardware teams to ensure reliability and correctness across modern AI workloads.

MACHINE LEARNINGMLOPS INFRAEVALS

Senior Staff Engineer - Telemetry

Graphcore

Gdańsk, Pomeranian Voivodeship, Polandstaff19h ago

Senior Staff Engineer role focused on designing and deploying scalable management and observability solutions for AI infrastructure at Graphcore. The position requires architecting monitoring systems, establishing infrastructure controls, and creating reference designs that bridge internal engineering efforts with customer deployments.

MLOPS INFRASYSTEM DESIGN

Senior Software Engineer

Graphcore

Gdańsk, Pomeranian Voivodeship, Polandsenior19h ago

Senior Software Engineer role at Graphcore focused on designing and developing a large-scale collective communication simulator for new AI hardware. Requires expertise in complex software systems, hardware integration, and distributed computing with leadership responsibilities for mentoring junior engineers and driving technical excellence.

SYSTEM DESIGNMLOPS INFRAMACHINE LEARNING

Senior Quality Assurance Engineer - Workload Management

Graphcore

Gdańsk, Pomeranian Voivodeship, Polandsenior19h ago

Senior QA Engineer role focused on validating Kubernetes integrations for next-generation AI accelerator hardware at Graphcore. The position involves testing workload scheduling, orchestration, and resource utilization across distributed computing environments within the SoftBank AI ecosystem. This is a critical infrastructure testing role bridging hardware validation with cloud-native deployment systems.

MLOPS INFRASYSTEM DESIGN

Senior Machine Learning Engineer (Large Systems)

Graphcore

Bristol, UKsenior19h ago

This role focuses on developing and optimizing AI models for Graphcore's specialized hardware at large scale, working across distributed systems spanning thousands of accelerators. The engineer will collaborate with software and research teams to implement cutting-edge models, benchmark performance, optimize kernels, and contribute to reference applications that showcase the hardware's capabilities. Success requires deep expertise in machine learning implementation, system-level optimization, and the ability to identify and resolve performance bottlenecks in production-scale AI systems.

MACHINE LEARNINGSYSTEM DESIGNMLOPS INFRA

Senior Machine Learning Engineer (Large Systems)

Graphcore

London, UKsenior19h ago

Senior ML Engineer role focused on developing and optimizing AI models for specialized hardware at massive scale (1000s of accelerators). The position requires expertise in model implementation, performance optimization, and distributed systems while working closely with software and research teams to advance Graphcore's AI compute technology.

MACHINE LEARNINGSYSTEM DESIGNMLOPS INFRA

Senior Kubernetes Software Engineer (Go)

Graphcore

Gdańsk, Pomeranian Voivodeship, Polandsenior19h ago

Graphcore seeks a Senior Kubernetes Software Engineer to develop Go-based services that integrate AI accelerator hardware into Kubernetes clusters. This role focuses on building production-grade software components at the intersection of hardware, software, and cloud platforms, including CRDs, operators, and cloud-native infrastructure solutions.

MLOPS INFRASYSTEM DESIGN

Quality Assurance Engineer - Workload Management

Graphcore

Gdańsk, Pomeranian Voivodeship, Polandmid19h ago

This QA Engineer role focuses on validating Kubernetes integrations for next-generation AI accelerator hardware within the Workload Management team at Graphcore. The position involves testing and ensuring efficient scheduling, orchestration, and utilization of AI accelerators across distributed computing environments. This is a critical infrastructure role bridging AI hardware and cloud-native orchestration platforms.

MLOPS INFRASYSTEM DESIGN

Principal Software Architect

Graphcore

Bristol, UK; Cambridge, UKprincipal19h ago

Graphcore seeks a Principal Software Architect to define and drive the architectural vision of their ML accelerator's software stack, spanning firmware to ML frameworks. The role involves designing coherent end-to-end architecture, communicating complex technical vision across engineering teams, and maintaining architectural integrity as the product evolves. This is a high-level technical leadership position requiring deep expertise in system design and hardware-software interactions.

SYSTEM DESIGNMACHINE LEARNINGMLOPS INFRA

Principal Power Engineer

Graphcore

US - Milpitasprincipal19h ago

Graphcore seeks a Principal Power Engineer to architect power delivery systems from grid to chip for their AI computing platforms. The role requires deep expertise in power electronics and infrastructure design, collaborating across hardware, firmware, and data center operations teams. This is a strategic leadership position focused on ensuring performance, reliability, and scalability for next-generation AI computing systems.

SYSTEM DESIGNMLOPS INFRA

Principal Power Engineer

Graphcore

US - Milpitasprincipal19h ago

Principal Power Engineer role at Graphcore focused on architecting power delivery systems from grid to chip for AI computing hardware. Requires deep expertise in power distribution, semiconductor systems, and data center infrastructure with cross-functional leadership across hardware, firmware, and operations teams. Based in Austin or San Jose with flexible remote work options, offering competitive compensation with phantom equity.

SYSTEM DESIGNMLOPS INFRA

Principal Hardware Diagnostics Engineer

Graphcore

Austin, Texas, United States; US - Milpitasprincipal19h ago

Graphcore seeks a Principal Hardware Diagnostics Engineer to design and develop advanced diagnostics software for monitoring hardware health and diagnosing system-level issues across their AI infrastructure platforms. The role involves building diagnostics agents, tools, and analytics frameworks to help engineers and automation systems identify, isolate, and resolve hardware issues at blade-level and rack-scale cluster levels.

MLOPS INFRASYSTEM DESIGN

Graduate Electrical Engineer

Graphcore

US - Milpitasjunior19h ago

Graphcore seeks a recent graduate in Electrical or Computer Engineering to join their Server Design team in Austin, developing next-generation AI infrastructure for hyperscale data centers. The role involves designing cutting-edge AI servers and power distribution systems while collaborating with experienced engineers across multiple disciplines to deliver high-performance, power-efficient solutions for modern AI workloads.

SYSTEM DESIGNMLOPS INFRA

Distinguished Engineer - Inference Serving Network and Storage

Graphcore

Austin, Texas, United Statesprincipal19h ago

Distinguished Engineer role leading end-to-end networking and storage architecture for large-scale AI inference serving systems at Graphcore. Responsible for defining serving fabric design, KV cache management, storage strategies, and driving cross-functional technical decisions that impact product differentiation and competitive advantage. Chief technologist position requiring expert-level technical leadership, strategic thinking, and influence across organizational boundaries.

MLOPS INFRASYSTEM DESIGN

BMC Engineer

Graphcore

Gdańsk, Pomeranian Voivodeship, Polandmid19h ago

This role focuses on developing and maintaining OpenBMC software stacks for AI server management hardware, including kernel drivers for ASPEED devices and Redfish API enhancements. The engineer will work cross-functionally with firmware, hardware, and business teams to deliver enterprise-grade baseboard and rack management solutions for Graphcore's AI compute infrastructure. This is a systems-level engineering position critical to the deployment of large-scale AI datacenter environments.

SYSTEM DESIGNMLOPS INFRA

Asset & Inventory Operations Coordinator

Graphcore

US - Milpitasmid19h ago

Graphcore seeks an Asset & Inventory Operations Coordinator to manage end-to-end material lifecycle for R&D infrastructure, including GPU components, custom accelerators, and data center equipment across global lab networks. The role involves building frameworks, tools, and workflows to ensure tracking, delivery, compliance, and operational excellence for engineering labs and infrastructure systems.

MLOPS INFRASYSTEM DESIGN

Asset & Inventory Operations Coordinator

Graphcore

Austin, Texas, United Statesmid19h ago

Graphcore seeks an Asset & Inventory Operations Coordinator to manage end-to-end material lifecycle for R&D infrastructure, supporting engineering labs and data centers with component tracking and delivery. The role involves building frameworks, tools, and workflows for asset management, compliance, and operational excellence across a global lab network. This is a critical operational position supporting advanced AI computing hardware infrastructure at a SoftBank Group company.

MLOPS INFRASYSTEM DESIGN

2026 Summer Intern - Software Engineering - Drivers

Graphcore

Bristol, UKjunior19h ago

Graphcore seeks a 2026 Summer Intern to join the Drivers and Utilities team, developing kernel and user space software that enables customers to maximize performance from Graphcore's AI hardware (IPU). The role involves writing production-quality code with comprehensive testing, participating in code reviews and technical design discussions, and maintaining CI/CD systems as part of a cross-functional team supporting the Poplar SDK.

MLOPS INFRASYSTEM DESIGN

2026 Graduate Software Engineer - Neuro Engine Modelling

Graphcore

Bristol, UKjunior19h ago

Graduate software engineer role focused on modeling and simulating next-generation AI hardware at Graphcore. You'll contribute to design, implementation, and testing of hardware software using C/C++/Python while working in agile scrum teams alongside hardware and framework engineers. The position involves integration of modeling capabilities into ML stacks and supporting distributed development environments.

MACHINE LEARNINGSYSTEM DESIGNMLOPS INFRA

2026 Graduate Software Engineer - Drivers

Graphcore

Bristol, UKjunior19h ago

A graduate-level software engineering role focused on developing kernel and user-space drivers for Graphcore's AI accelerator hardware (IPU). The position involves building system software, writing well-tested code, and collaborating with hardware and ML teams to optimize performance and enable customer applications on Graphcore's specialized AI compute platforms.

MLOPS INFRASYSTEM DESIGN

2026 Graduate Software Engineer - DevOps

Graphcore

Bristol, UKjunior19h ago

A graduate-level DevOps engineering role at Graphcore focused on building infrastructure and tools for AI compute systems. You'll work on scaling infrastructure, CI/CD pipelines, and deployment processes for machine learning software components while gaining experience with distributed systems and HPC platforms.

MLOPS INFRASYSTEM DESIGN

2026 Graduate IT Infrastructure Engineer

Graphcore

Bristol, UKjunior19h ago

A graduate-level IT Infrastructure Engineer role at Graphcore focused on designing, building, and maintaining enterprise-scale infrastructure that supports AI hardware and workloads. The position involves hands-on experience with on-premise systems, cloud platforms, networking, automation, and infrastructure-as-code across an organization developing next-generation AI compute hardware.

MLOPS INFRASYSTEM DESIGN

Staff Fullstack Engineer - Onboard Experience

Wayve

Londonstaff19h ago

Lead fullstack engineer for autonomous vehicle onboarding experience, responsible for architecting and maintaining real-time in-vehicle web visualization that displays AI perception and decision-making to operators. Role spans frontend, backend, and embedded systems while balancing rapid iteration with safety-critical reliability and performance constraints.

COMPUTER VISIONSYSTEM DESIGNMLOPS INFRA

Software Integration Engineer (6 months Contract)

Wayve

Sunnyvalemid19h ago

Wayve seeks a Software Integration Engineer to bring-up and validate their autonomous driving AI software stack on customer hardware platforms including NVIDIA Drive and Qualcomm Ride SoCs. The role focuses on porting Linux/QNX/AUTOSAR systems, integrating drivers and middleware, and collaborating with verification and OEM teams to ensure reliable CI/CD and test infrastructure integration for self-driving vehicles.

COMPUTER VISIONSYSTEM DESIGNMLOPS INFRA

Software Engineer - System Performance, Robot Software

Wayve

Sunnyvalemid19h ago

Join Wayve's Robot Software team as a System Performance engineer responsible for building observability, profiling tools, and infrastructure that optimize software performance across the autonomous vehicle fleet. You will enable model developers and scientists to iterate quickly by ensuring reliable, efficient compute utilization and providing the tools needed to detect and resolve performance bottlenecks in onboard systems.

SYSTEM DESIGNCOMPUTER VISIONMLOPS INFRA

Site Reliability Engineer

Wayve

Japanmid19h ago

Site Reliability Engineer at Wayve focused on ensuring operational excellence and reliability of autonomous vehicle systems. The role emphasizes infrastructure automation, monitoring, and metrics mastery to keep AI-driven autonomous vehicles robust and resilient in production environments. This position bridges ML systems with enterprise infrastructure, requiring strong DevOps and systems engineering expertise.

MLOPS INFRASYSTEM DESIGNCOMPUTER VISION

Senior Site Reliability Engineer, Vehicle SW

Wayve

Londonsenior19h ago

Senior Site Reliability Engineer role at Wayve focused on maintaining reliability, observability, and safety of autonomous vehicle fleets operating on public roads. The position bridges software, hardware, and operations to transform real-world incidents into lasting improvements while supporting fleet scaling and faster deployment cycles.

SYSTEM DESIGNMLOPS INFRACOMPUTER VISION

Senior Fullstack Engineer - Data Enrichment

Wayve

Londonsenior19h ago

Senior Fullstack Engineer role at Wayve (autonomous driving AI company) to build an internal web application for browsing computer vision models and exploring results. You'll partner with internal stakeholders to design and ship an MVP quickly, then iterate on user feedback while integrating with Databricks and existing backend services.

COMPUTER VISIONMLOPS INFRASYSTEM DESIGN

Release Manager

Wayve

Sunnyvalemid19h ago

Release Manager at Wayve responsible for coordinating autonomous driving software releases with reliability, stability, and predictable timing. The role requires balancing fast-paced innovation with disciplined release practices, change management, and cross-team coordination to support fleet operations and product development.

SYSTEM DESIGNMLOPS INFRA

Principal Application Software Engineer - Relocation to Tokyo

Wayve

Singaporeprincipal19h ago

Wayve seeks a Principal Application Software Engineer to lead the localization of their autonomous driving AI stack for the Japanese market. The role focuses on hardware bring-up, OS porting (Linux, QNX, AUTOSAR), and validation across diverse automotive SoC platforms while collaborating with cross-functional teams on OEM integration.

COMPUTER VISIONSYSTEM DESIGNMLOPS INFRA

Field Engineer

Wayve

Londonmid19h ago

Wayve seeks a hands-on Field Engineer to diagnose and resolve autonomous vehicle software and hardware issues in their fleet operations. The role requires rapid fault triage with 5-minute response times, collaboration with global teams, and adherence to strict safety protocols while helping establish new operational processes.

COMPUTER VISIONSYSTEM DESIGNMLOPS INFRA

Customer Integration Engineer

Wayve

Sunnyvalemid19h ago

This Customer Integration Engineer role supports Wayve's autonomous driving AI technology throughout the customer development lifecycle, from integration to validation. The position requires hands-on technical expertise to diagnose issues, bridge customer and internal engineering teams, and ensure seamless deployment of mapless AI driver systems into vehicles. Success demands strong system understanding, real-time troubleshooting skills, and close collaboration with automotive engineering teams.

COMPUTER VISIONSYSTEM DESIGNMLOPS INFRA

Application Software Engineer - Relocation to Tokyo

Wayve

Detroitmid19h ago

Wayve seeks an Application Software Engineer to localize and deploy their Embodied AI autonomous driving stack on Japanese customer hardware platforms. The role focuses on software bring-up, hardware integration, and validation across diverse automotive SoCs and operating systems (Linux, QNX, AUTOSAR) in collaboration with cross-functional teams.

COMPUTER VISIONSYSTEM DESIGNMLOPS INFRA

Solutions Engineer

Speechmatics

Cambridge, England, United Kingdommid19h ago

A customer-facing Solutions Engineer role focused on speech technology evaluation and adoption. Requires hands-on Python programming, cloud/infrastructure deployment experience, and strong technical communication skills to guide customers through demos, PoCs, and integrations with Speechmatics' speech recognition platform.

NLP LLMSSYSTEM DESIGNMLOPS INFRA

Solutions Engineer

Speechmatics

Belgrade, Belgrade, Serbiamid19h ago

This Solutions Engineer role focuses on helping customers evaluate and adopt Speechmatics' speech recognition technology through technical pre-sales and proof-of-concept delivery. The position requires hands-on Python programming, cloud/infrastructure deployment experience, and strong customer-facing communication skills to guide technical evaluation from discovery through implementation.

NLP LLMSSYSTEM DESIGNMLOPS INFRA

Solutions Engineer

Speechmatics

London, England, United Kingdommid19h ago

A Solutions Engineer role at Speechmatics focused on speech technology adoption, requiring hands-on technical expertise in Python, cloud deployment, and API integrations. The position bridges sales and engineering, demanding strong customer-facing skills alongside the ability to build demos, write code, and design technical solutions for speech recognition implementations.

NLP LLMSSYSTEM DESIGNMLOPS INFRA

Solutions Engineer

Speechmatics

Mexico City, Mexico City, Mexicomid19h ago

Solutions Engineer role focused on helping customers evaluate and implement Speechmatics' speech recognition technology. Requires hands-on technical expertise in Python, cloud/Linux deployments, and API integrations, combined with strong customer-facing communication skills and ability to build demos and PoCs from discovery through validation.

NLP LLMSSYSTEM DESIGNMLOPS INFRA

Software Engineer

Speechmatics

Cambridge, England, United Kingdommid19h ago

Speechmatics is seeking a Software Engineer to design and build scalable, resilient systems that integrate AI transcription models into production. The role emphasizes system architecture, performance optimization, and cross-team collaboration while requiring strong engineering fundamentals and no prior audio/speech domain experience.

SYSTEM DESIGNMLOPS INFRA