MLOps / Infra Jobs
ML infrastructure and operations roles — model serving, training pipelines, monitoring, deployment.
50 open positions
Storage Architect
Graphcore
Graphcore seeks a Storage Architect to design and optimize high-performance storage systems for AI data centers, specializing in NVMe SSDs, PCIe topologies, and Linux kernel tuning. The role focuses on eliminating I/O bottlenecks for GPU training and inference workloads, including GPU-direct storage optimization and telemetry-driven system design. This is a critical infrastructure position requiring deep expertise in storage hardware and software optimization at massive scale.
Staff UEFI Engineer
Graphcore
Graphcore seeks an experienced Staff UEFI Engineer to design and deploy firmware for AI server platforms, focusing on system initialization and hardware configuration in large-scale data center environments. This role combines deep firmware expertise with infrastructure engineering to support next-generation AI compute hardware.
Staff System Software Engineer
Graphcore
Graphcore seeks a Staff System Software Engineer to design, implement, and test low-level kernel drivers and user-space driver libraries as part of their system software team. This role focuses on building the critical driver infrastructure for Graphcore's AI compute hardware stack. The position requires deep expertise in systems programming and hardware-software integration to support their datacenter-scale AI compute platform.
Staff Microcontroller Firmware Developer
Graphcore
Graphcore seeks a Staff Microcontroller Firmware Developer to design and implement firmware for microcontroller-based management systems in AI server and rack-scale platforms. The role focuses on Zephyr RTOS-based firmware development and low-level device drivers for real-time embedded systems supporting hyperscale data center infrastructure. This is a senior-level engineering position requiring deep expertise in embedded firmware development and hardware-software integration.
Staff Machine Learning Engineer (Large Systems)
Graphcore
This is a Staff Machine Learning Engineer role at Graphcore focused on developing and optimizing AI models for specialized hardware at scale. The position involves implementing cutting-edge ML models, optimizing performance across thousands of accelerators, and collaborating with software and research teams to advance AI compute technology. Candidates should have strong expertise in distributed training, model optimization, and large-scale system implementation.
Staff Firmware Validation Engineer
Graphcore
Graphcore seeks a Staff Firmware Validation Engineer to ensure quality and reliability of firmware across ARM-based AI server platforms, including SoC firmware (UEFI), OpenBMC, and rack management systems. This role is critical for validating the infrastructure powering hyperscale AI deployments and requires deep expertise in firmware testing and system-level validation.
Staff Bring-Up and Characterisation Engineer
Graphcore
Graphcore seeks a Staff Bring-Up and Characterisation Engineer to lead the bring-up and validation of cutting-edge AI silicon processors and high-performance blade systems at their new Austin AI Engineering Campus. This role requires deep expertise in hardware characterization, silicon validation, and cross-functional coordination with architecture, silicon engineering, and product teams. The position offers a competitive salary of $198,100-$268,000 plus phantom equity, targeting an experienced engineer capable of managing complex hardware platform launches.
Software Infrastructure Kubernetes Engineer
Graphcore
Graphcore seeks a Kubernetes-focused Software Infrastructure engineer to develop and manage critical platforms supporting their AI compute teams. The role involves building CI/CD pipelines, deployment systems, and infrastructure services for machine learning software components on HPC platforms. You'll work in cross-functional squads to eliminate operational toil and deliver long-term engineering solutions.
Software Engineer in Build Engineering
Graphcore
Join Graphcore's new Build Engineering team to develop critical infrastructure tools that power their ML software stack. You'll optimize build, test, and deployment processes for high-performance AI platforms while working with distributed systems and collaborating closely with QA and development teams.
Software Engineer
Graphcore
Graphcore seeks a Software Engineer to develop their Collectives Communication Library for AI hardware accelerators, focusing on high-bandwidth, low-latency distributed computing primitives. The role involves designing and implementing complex software systems that integrate custom hardware with existing AI ecosystems, requiring strong systems programming and distributed systems expertise.
Senior Technical Program Manager
Graphcore
Senior Technical Program Manager role at Graphcore focused on bridging technical and management domains for AI compute infrastructure. Responsibilities include liaising between technical teams, managing project schedules, and ensuring delivery of integrated hardware-software systems across workload management, systems management, and observability platforms. Requires deep technical expertise combined with program management excellence to coordinate multi-functional teams in a complex semiconductor and AI infrastructure context.
Senior Staff ML Engineer
Graphcore
Senior Staff ML Engineer role at Graphcore focused on testing, validating, and benchmarking a complex ML software stack across AI accelerator hardware. The ideal candidate has deep experience with ML frameworks, model training/execution, and debugging functional and performance issues while collaborating across software and hardware teams.
Senior Staff ML Engineer
Graphcore
Senior Staff ML Engineer role at Graphcore focused on testing, validating, and benchmarking complex ML software stacks across AI accelerator hardware. The position requires deep expertise in ML systems, hands-on debugging of functional and performance issues, and close collaboration with software and hardware teams to ensure reliability and correctness across modern AI workloads.
Senior Staff Engineer - Telemetry
Graphcore
Senior Staff Engineer role focused on designing and deploying scalable management and observability solutions for AI infrastructure at Graphcore. The position requires architecting monitoring systems, establishing infrastructure controls, and creating reference designs that bridge internal engineering efforts with customer deployments.
Senior Software Engineer
Graphcore
Senior Software Engineer role at Graphcore focused on designing and developing a large-scale collective communication simulator for new AI hardware. Requires expertise in complex software systems, hardware integration, and distributed computing with leadership responsibilities for mentoring junior engineers and driving technical excellence.
Senior Quality Assurance Engineer - Workload Management
Graphcore
Senior QA Engineer role focused on validating Kubernetes integrations for next-generation AI accelerator hardware at Graphcore. The position involves testing workload scheduling, orchestration, and resource utilization across distributed computing environments within the SoftBank AI ecosystem. This is a critical infrastructure testing role bridging hardware validation with cloud-native deployment systems.
Senior Machine Learning Engineer (Large Systems)
Graphcore
This role focuses on developing and optimizing AI models for Graphcore's specialized hardware at large scale, working across distributed systems spanning thousands of accelerators. The engineer will collaborate with software and research teams to implement cutting-edge models, benchmark performance, optimize kernels, and contribute to reference applications that showcase the hardware's capabilities. Success requires deep expertise in machine learning implementation, system-level optimization, and the ability to identify and resolve performance bottlenecks in production-scale AI systems.
Senior Machine Learning Engineer (Large Systems)
Graphcore
Senior ML Engineer role focused on developing and optimizing AI models for specialized hardware at massive scale (1000s of accelerators). The position requires expertise in model implementation, performance optimization, and distributed systems while working closely with software and research teams to advance Graphcore's AI compute technology.
Senior Kubernetes Software Engineer (Go)
Graphcore
Graphcore seeks a Senior Kubernetes Software Engineer to develop Go-based services that integrate AI accelerator hardware into Kubernetes clusters. This role focuses on building production-grade software components at the intersection of hardware, software, and cloud platforms, including CRDs, operators, and cloud-native infrastructure solutions.
Quality Assurance Engineer - Workload Management
Graphcore
This QA Engineer role focuses on validating Kubernetes integrations for next-generation AI accelerator hardware within the Workload Management team at Graphcore. The position involves testing and ensuring efficient scheduling, orchestration, and utilization of AI accelerators across distributed computing environments. This is a critical infrastructure role bridging AI hardware and cloud-native orchestration platforms.
Principal Software Architect
Graphcore
Graphcore seeks a Principal Software Architect to define and drive the architectural vision of their ML accelerator's software stack, spanning firmware to ML frameworks. The role involves designing coherent end-to-end architecture, communicating complex technical vision across engineering teams, and maintaining architectural integrity as the product evolves. This is a high-level technical leadership position requiring deep expertise in system design and hardware-software interactions.
Principal Power Engineer
Graphcore
Graphcore seeks a Principal Power Engineer to architect power delivery systems from grid to chip for their AI computing platforms. The role requires deep expertise in power electronics and infrastructure design, collaborating across hardware, firmware, and data center operations teams. This is a strategic leadership position focused on ensuring performance, reliability, and scalability for next-generation AI computing systems.
Principal Power Engineer
Graphcore
Principal Power Engineer role at Graphcore focused on architecting power delivery systems from grid to chip for AI computing hardware. Requires deep expertise in power distribution, semiconductor systems, and data center infrastructure with cross-functional leadership across hardware, firmware, and operations teams. Based in Austin or San Jose with flexible remote work options, offering competitive compensation with phantom equity.
Principal Hardware Diagnostics Engineer
Graphcore
Graphcore seeks a Principal Hardware Diagnostics Engineer to design and develop advanced diagnostics software for monitoring hardware health and diagnosing system-level issues across their AI infrastructure platforms. The role involves building diagnostics agents, tools, and analytics frameworks to help engineers and automation systems identify, isolate, and resolve hardware issues at blade-level and rack-scale cluster levels.
Graduate Electrical Engineer
Graphcore
Graphcore seeks a recent graduate in Electrical or Computer Engineering to join their Server Design team in Austin, developing next-generation AI infrastructure for hyperscale data centers. The role involves designing cutting-edge AI servers and power distribution systems while collaborating with experienced engineers across multiple disciplines to deliver high-performance, power-efficient solutions for modern AI workloads.
Distinguished Engineer - Inference Serving Network and Storage
Graphcore
Distinguished Engineer role leading end-to-end networking and storage architecture for large-scale AI inference serving systems at Graphcore. Responsible for defining serving fabric design, KV cache management, storage strategies, and driving cross-functional technical decisions that impact product differentiation and competitive advantage. Chief technologist position requiring expert-level technical leadership, strategic thinking, and influence across organizational boundaries.
BMC Engineer
Graphcore
This role focuses on developing and maintaining OpenBMC software stacks for AI server management hardware, including kernel drivers for ASPEED devices and Redfish API enhancements. The engineer will work cross-functionally with firmware, hardware, and business teams to deliver enterprise-grade baseboard and rack management solutions for Graphcore's AI compute infrastructure. This is a systems-level engineering position critical to the deployment of large-scale AI datacenter environments.
Asset & Inventory Operations Coordinator
Graphcore
Graphcore seeks an Asset & Inventory Operations Coordinator to manage end-to-end material lifecycle for R&D infrastructure, including GPU components, custom accelerators, and data center equipment across global lab networks. The role involves building frameworks, tools, and workflows to ensure tracking, delivery, compliance, and operational excellence for engineering labs and infrastructure systems.
Asset & Inventory Operations Coordinator
Graphcore
Graphcore seeks an Asset & Inventory Operations Coordinator to manage end-to-end material lifecycle for R&D infrastructure, supporting engineering labs and data centers with component tracking and delivery. The role involves building frameworks, tools, and workflows for asset management, compliance, and operational excellence across a global lab network. This is a critical operational position supporting advanced AI computing hardware infrastructure at a SoftBank Group company.
2026 Summer Intern - Software Engineering - Drivers
Graphcore
Graphcore seeks a 2026 Summer Intern to join the Drivers and Utilities team, developing kernel and user space software that enables customers to maximize performance from Graphcore's AI hardware (IPU). The role involves writing production-quality code with comprehensive testing, participating in code reviews and technical design discussions, and maintaining CI/CD systems as part of a cross-functional team supporting the Poplar SDK.
2026 Graduate Software Engineer - Neuro Engine Modelling
Graphcore
Graduate software engineer role focused on modeling and simulating next-generation AI hardware at Graphcore. You'll contribute to design, implementation, and testing of hardware software using C/C++/Python while working in agile scrum teams alongside hardware and framework engineers. The position involves integration of modeling capabilities into ML stacks and supporting distributed development environments.
2026 Graduate Software Engineer - Drivers
Graphcore
A graduate-level software engineering role focused on developing kernel and user-space drivers for Graphcore's AI accelerator hardware (IPU). The position involves building system software, writing well-tested code, and collaborating with hardware and ML teams to optimize performance and enable customer applications on Graphcore's specialized AI compute platforms.
2026 Graduate Software Engineer - DevOps
Graphcore
A graduate-level DevOps engineering role at Graphcore focused on building infrastructure and tools for AI compute systems. You'll work on scaling infrastructure, CI/CD pipelines, and deployment processes for machine learning software components while gaining experience with distributed systems and HPC platforms.
2026 Graduate IT Infrastructure Engineer
Graphcore
A graduate-level IT Infrastructure Engineer role at Graphcore focused on designing, building, and maintaining enterprise-scale infrastructure that supports AI hardware and workloads. The position involves hands-on experience with on-premise systems, cloud platforms, networking, automation, and infrastructure-as-code across an organization developing next-generation AI compute hardware.
Staff Fullstack Engineer - Onboard Experience
Wayve
Lead fullstack engineer for autonomous vehicle onboarding experience, responsible for architecting and maintaining real-time in-vehicle web visualization that displays AI perception and decision-making to operators. Role spans frontend, backend, and embedded systems while balancing rapid iteration with safety-critical reliability and performance constraints.
Software Integration Engineer (6 months Contract)
Wayve
Wayve seeks a Software Integration Engineer to bring-up and validate their autonomous driving AI software stack on customer hardware platforms including NVIDIA Drive and Qualcomm Ride SoCs. The role focuses on porting Linux/QNX/AUTOSAR systems, integrating drivers and middleware, and collaborating with verification and OEM teams to ensure reliable CI/CD and test infrastructure integration for self-driving vehicles.
Software Engineer - System Performance, Robot Software
Wayve
Join Wayve's Robot Software team as a System Performance engineer responsible for building observability, profiling tools, and infrastructure that optimize software performance across the autonomous vehicle fleet. You will enable model developers and scientists to iterate quickly by ensuring reliable, efficient compute utilization and providing the tools needed to detect and resolve performance bottlenecks in onboard systems.
Site Reliability Engineer
Wayve
Site Reliability Engineer at Wayve focused on ensuring operational excellence and reliability of autonomous vehicle systems. The role emphasizes infrastructure automation, monitoring, and metrics mastery to keep AI-driven autonomous vehicles robust and resilient in production environments. This position bridges ML systems with enterprise infrastructure, requiring strong DevOps and systems engineering expertise.
Senior Site Reliability Engineer, Vehicle SW
Wayve
Senior Site Reliability Engineer role at Wayve focused on maintaining reliability, observability, and safety of autonomous vehicle fleets operating on public roads. The position bridges software, hardware, and operations to transform real-world incidents into lasting improvements while supporting fleet scaling and faster deployment cycles.
Senior Fullstack Engineer - Data Enrichment
Wayve
Senior Fullstack Engineer role at Wayve (autonomous driving AI company) to build an internal web application for browsing computer vision models and exploring results. You'll partner with internal stakeholders to design and ship an MVP quickly, then iterate on user feedback while integrating with Databricks and existing backend services.
Release Manager
Wayve
Release Manager at Wayve responsible for coordinating autonomous driving software releases with reliability, stability, and predictable timing. The role requires balancing fast-paced innovation with disciplined release practices, change management, and cross-team coordination to support fleet operations and product development.
Principal Application Software Engineer - Relocation to Tokyo
Wayve
Wayve seeks a Principal Application Software Engineer to lead the localization of their autonomous driving AI stack for the Japanese market. The role focuses on hardware bring-up, OS porting (Linux, QNX, AUTOSAR), and validation across diverse automotive SoC platforms while collaborating with cross-functional teams on OEM integration.
Field Engineer
Wayve
Wayve seeks a hands-on Field Engineer to diagnose and resolve autonomous vehicle software and hardware issues in their fleet operations. The role requires rapid fault triage with 5-minute response times, collaboration with global teams, and adherence to strict safety protocols while helping establish new operational processes.
Customer Integration Engineer
Wayve
This Customer Integration Engineer role supports Wayve's autonomous driving AI technology throughout the customer development lifecycle, from integration to validation. The position requires hands-on technical expertise to diagnose issues, bridge customer and internal engineering teams, and ensure seamless deployment of mapless AI driver systems into vehicles. Success demands strong system understanding, real-time troubleshooting skills, and close collaboration with automotive engineering teams.
Application Software Engineer - Relocation to Tokyo
Wayve
Wayve seeks an Application Software Engineer to localize and deploy their Embodied AI autonomous driving stack on Japanese customer hardware platforms. The role focuses on software bring-up, hardware integration, and validation across diverse automotive SoCs and operating systems (Linux, QNX, AUTOSAR) in collaboration with cross-functional teams.
Solutions Engineer
Speechmatics
A customer-facing Solutions Engineer role focused on speech technology evaluation and adoption. Requires hands-on Python programming, cloud/infrastructure deployment experience, and strong technical communication skills to guide customers through demos, PoCs, and integrations with Speechmatics' speech recognition platform.
Solutions Engineer
Speechmatics
This Solutions Engineer role focuses on helping customers evaluate and adopt Speechmatics' speech recognition technology through technical pre-sales and proof-of-concept delivery. The position requires hands-on Python programming, cloud/infrastructure deployment experience, and strong customer-facing communication skills to guide technical evaluation from discovery through implementation.
Solutions Engineer
Speechmatics
A Solutions Engineer role at Speechmatics focused on speech technology adoption, requiring hands-on technical expertise in Python, cloud deployment, and API integrations. The position bridges sales and engineering, demanding strong customer-facing skills alongside the ability to build demos, write code, and design technical solutions for speech recognition implementations.
Solutions Engineer
Speechmatics
Solutions Engineer role focused on helping customers evaluate and implement Speechmatics' speech recognition technology. Requires hands-on technical expertise in Python, cloud/Linux deployments, and API integrations, combined with strong customer-facing communication skills and ability to build demos and PoCs from discovery through validation.
Software Engineer
Speechmatics
Speechmatics is seeking a Software Engineer to design and build scalable, resilient systems that integrate AI transcription models into production. The role emphasizes system architecture, performance optimization, and cross-team collaboration while requiring strong engineering fundamentals and no prior audio/speech domain experience.