Reinforcement Learning Jobs
RL and RLHF engineering roles at robotics, gaming, and AI labs.
50 open positions
Research Scientist (Embodied AI & World Models)
Graphcore
Graphcore seeks an experienced Research Scientist to advance embodied AI and world models for edge/low-power scenarios including robotics and autonomous driving. The role involves developing hardware-aware AI algorithms and deploying multimodal models while contributing to fundamental and applied research published at top-tier ML conferences. You'll join a collaborative research team across UK locations working on efficient compute, model scaling, and next-generation AI architectures.
Research Engineer
Graphcore
Graphcore seeks a Research Engineer to advance AI compute through hardware-aware algorithms and implementations. The role combines machine learning expertise with strong software engineering and performance optimization skills to deliver impactful research across efficient training/inference, world models, and reinforcement learning. You'll collaborate with researchers to translate ideas into scalable implementations and contribute to publications at leading AI conferences.
Technical Product Manager - Robotaxi
Wayve
Technical Product Manager role at Wayve focused on defining and delivering Level 4 autonomous driving solutions powered by embodied AI. Requires deep expertise in autonomous vehicle systems, cross-functional collaboration with R&D and safety teams, and ability to translate cutting-edge AI innovation into market-ready products while managing customer relationships and regulatory frameworks.
Tech Lead Manager - Multi Modal Foundation Models (Language)
Wayve
Lead a team developing language grounding and reasoning capabilities for Wayve's multimodal foundation models used in autonomous driving. Drive foundational research at the intersection of large-scale pretraining, language understanding, and embodied agent alignment, with focus on grounded understanding in real-world contexts.
Tech Lead - AI Validation Systems Engineer
Wayve
Tech Lead role at Wayve focused on building AI validation systems for autonomous driving. Responsible for designing test scenarios and safety assessments for end-to-end learned driving models, while leading a team of validation engineers. Bridges AI/ML expertise with systems engineering to establish trust and safety standards for the autonomous vehicle industry.
Strategic Partnership Manager
Wayve
Strategic Partnership Manager role at Wayve focused on commercializing autonomous driving technology through robotaxi deployments in London and Tokyo. The position requires managing partnerships with ride-hailing platforms and automotive OEMs while coordinating cross-functional teams to execute pilot launches and scale fleet operations. This is a business-focused role bridging Embodied AI technology with real-world mobility applications.
Software Engineer - OS & Kernel, Robot Software
Wayve
Join Wayve's OS & Kernel team to develop and maintain a custom Linux distribution powering their autonomous vehicle fleet. You'll work on core system infrastructure that enables rapid iteration for model developers and ensures reliable on-road experimentation for autonomous driving capabilities. This role requires deep expertise in Linux kernel development, Yocto, and embedded systems to support Wayve's mission of advancing embodied AI technology.
Robotaxi Technical Operations
Wayve
This role bridges technical AI/autonomous driving development with field operations at Wayve, requiring someone to lead testing programs, demonstration campaigns, and on-road validation for robotaxi systems. The position demands senior-level experience translating complex technical strategies into executable operations while maintaining feedback loops between field teams and product development.
Robotaxi Safety Programme Lead
Wayve
Lead the safety programme for Wayve's robotaxi deployment across multiple cities, translating AI safety approaches and compliance requirements into scalable, structured programmes. Own day-to-day execution while coordinating across engineering, operations, legal, and partner teams to ensure credible and defensible safety operations at scale.
Register your interest for Wayve in Germany
Wayve
Wayve is expanding to Germany and seeking talented individuals to join their autonomous driving AI team. The company develops embodied AI technology and foundation models that enable vehicles to perceive and navigate complex environments. This is an interest registration for upcoming opportunities in their new German office.
Principal Machine Learning Engineer, App SW
Wayve
Principal ML Engineer role focused on developing state-of-the-art driving models for autonomous vehicles, spanning model architecture, data pipelines, and real-world deployment. Responsibilities include leading personalized and collaborative driving projects while collaborating across AI Platform, Simulation, and Robot Software teams to deliver scalable, production-ready systems.
Operational Safety Coordinator
Wayve
Wayve seeks an Operational Safety Coordinator to support autonomous vehicle fleet testing operations, monitoring safety performance, managing incidents, and maintaining compliance documentation. The role requires 2+ years in safety/operations with knowledge of ADAS and autonomous vehicle systems, reporting to the Operational Safety Manager in a safety-critical position.
Machine Learning Manager, App SW
Wayve
Wayve seeks an experienced technical leader to establish and manage a Japan-based Application Engineering team focused on localizing autonomous driving technology for the Japanese market. The role requires expertise in machine learning, computer vision, and robotics with strong leadership capabilities to operate independently across time zones while delivering tailored AV solutions for Japan's unique infrastructure and regulatory environment.
Machine Learning Engineer, AV Engineering
Wayve
Senior/Staff ML Engineer role focused on building end-to-end AI driver models for autonomous vehicles (L2-L4), with ownership of the full training lifecycle from data curation through on-road validation. The position emphasizes product-driven execution with direct impact on commercial deployments like the Nissan MVP, requiring expertise in safety-critical machine learning and rapid iteration cycles.
Field Engineer
Wayve
A hands-on Field Engineer role at Wayve focused on diagnosing and resolving vehicle software and hardware issues in autonomous driving systems. The position requires rapid fault triage, on-road testing support, and collaboration with safety teams to keep the fleet operational while contributing real-world feedback to improve AI performance.
Customer Program Manager
Wayve
This is a program management role at Wayve, an autonomous driving AI company, requiring leadership of OEM customer programs from development through production. The role demands deep automotive and embodied AI knowledge, along with strong project management capabilities including risk and change management across complex cross-functional initiatives.
Customer Integration Engineer
Wayve
Customer Integration Engineer at Wayve supporting autonomous driving AI technology deployment with customers. The role requires hands-on technical expertise in vehicle integration, real-time diagnostics, and serving as a bridge between customer engineering teams and internal Wayve engineers to ensure seamless deployment of their Embodied AI software.
Customer Integration Engineer
Wayve
Customer Integration Engineer supporting Wayve's autonomous driving AI technology through hands-on technical assistance during vehicle integration and validation phases. Acts as the critical bridge between customer engineering teams and Wayve's internal teams, performing real-time diagnostics and troubleshooting of AI system integration issues in complex automotive environments.
Applied Scientist, Controllable GAIA
Wayve
Join Wayve's Science team to develop controllable generative world models (GAIA) that simulate realistic multi-sensor environments for autonomous vehicle training. You'll advance next-generation autonomous driving by creating photoreal, physics-aware simulations that accelerate AI development beyond collecting real-world miles.
Application Software Engineer - Software Integration / Embedded Software
Wayve
This role focuses on bringing up Wayve's autonomous driving AI stack on diverse automotive hardware platforms (NVIDIA Drive, Qualcomm Ride) and operating systems (Linux, QNX, AUTOSAR). The engineer will handle software porting, driver integration, and hardware validation for US market deployment, requiring strong embedded systems and systems integration expertise.
Senior Research Scientist (Must be based in UK)
PolyAI
PolyAI is seeking a Senior Research Scientist to lead cutting-edge research on large language model post-training for conversational voice assistants. The role focuses on novel approaches including conversational reinforcement learning, audio-native LLMs, streaming turn-taking, and reasoning model distillation. This is a research-focused leadership position requiring expertise in LLMs and dialogue systems, based in the UK.
Applied Research Intern
Labelbox
Design and build evaluation systems and benchmarks for frontier LLMs and multimodal models, including reasoning, code, and agent-use tasks. Create post-training datasets and prototype RLHF/RLAIF/DPO-style training loops to measure and improve model performance on real-world tasks.
Applied Research Engineer
Labelbox
Develop cutting-edge systems for creating and leveraging high-quality human feedback data to train frontier AI models using techniques like RLHF and DPO. Design advanced methods to align human preferences with AI training processes and measure the quality and impact of human-in-the-loop data. Work at the intersection of applied research and engineering to solve critical data-centric challenges in modern AI development.
Senior Staff Machine Learning Engineer, Ads Quality
Instacart
Senior Staff ML Engineer role focused on leading Ads Quality initiatives at Instacart, requiring expertise in AI/ML models and optimization for a 4-sided marketplace. The position combines individual contribution with technical leadership, mentoring, and cross-functional collaboration to drive next-generation advertising solutions at scale.
Senior Machine Learning Engineer, Operations Research
Instacart
Senior ML Engineer role focused on Operations Research at Instacart's Logistics team, solving complex fulfillment problems including order batching, shopper routing, and real-time assignment. The position combines combinatorial optimization and mathematical programming with ML to optimize a multi-sided marketplace for customers, shoppers, and retailers at scale.
Senior Machine Learning Engineer II, Search & Recommendations Ranking
Instacart
Lead the development of multi-task, multi-objective ranking models that unify search, recommendations, and ads across Instacart's platform. Build foundational ML systems optimized for incremental value (GTV, basket lift, retention) with LLM-enhanced retrieval and value-aware constraints. Partner with engineers and PMs to deploy ranking backbones powering the entire shopping experience at scale.
Senior Applied Scientist II, Ads Optimization
Instacart
Lead the algorithmic design of Instacart's real-time bidding and budget optimization systems that handle millions of daily decisions for their $1B+ ads business. Apply control theory, constrained optimization, and auction economics to balance advertiser goals, user experience, and platform revenue. Own systems end-to-end from mathematical formulation through production deployment and impact measurement.
Machine Learning Manager, Feed Relevance
Lead a high-impact ML engineering team at Reddit responsible for feed relevance, ranking, and content discovery systems serving 120M+ daily users. Define technical vision and strategy for sophisticated relevance modeling while coaching team development and collaborating across product, infrastructure, and business functions. This is a technical management role requiring deep ML expertise combined with proven leadership capabilities in shipping production recommendation systems.
Machine Learning Engineer
Reddit seeks a Machine Learning Engineer to build industrial-grade models for critical ML tasks, specializing in recommender systems and NLP with advanced neural networks and feature engineering. The role requires 3+ years of experience with expertise in Python, SQL, distributed computing (Spark/Beam), and cloud platforms (GCP/AWS), with optional full-time remote work available.
Senior Software Engineer, Chess
Duolingo
Senior Software Engineer role building Duolingo's chess learning platform from the ground up. The position involves designing backend services, collaborating with AI researchers to create adaptive learning experiences, and leveraging chess domain knowledge to shape product features. Ideal candidate combines strong backend engineering skills with active chess playing experience and interest in AI-driven educational products.
Senior Machine Learning Engineer II, NLU & Agentic AI
Moveworks
Senior Machine Learning Engineer role focused on advancing NLU and agentic AI capabilities at Moveworks. Requires expertise in LLM fine-tuning, conversational agents, compound AI systems, and production-grade ML infrastructure to build reliable enterprise copilot experiences. Strong emphasis on balancing model quality with latency, reliability, and end-to-end system performance.
Sr. Director of Product, Research and Training Infrastructure
CoreWeave
Sr. Director of Product role leading the Research Training Infrastructure stack at CoreWeave, responsible for product strategy and engineering execution of orchestration tools (Slurm on Kubernetes) powering frontier AI model training and post-training at scale. This executive position bridges HPC infrastructure and cloud-native tools to enable leading AI research labs to build and iterate on cutting-edge models efficiently.
Senior Research Scientist, Reward Models
Anthropic
Lead research scientist role focused on advancing reward modeling techniques for large language models, particularly RLHF training and LLM-based evaluation methods. The position emphasizes developing novel architectures, mitigating reward specification gaming, and translating research into production improvements while driving AI alignment and safety initiatives.
Research Scientist, Frontier Red Team (Emerging Risks)
Anthropic
This role involves building an independent research program to identify and evaluate emerging societal risks from advanced AI systems, with focus on integration risks rather than catastrophic scenarios. You will conduct red team evaluations, design experiments, and translate findings into actionable insights for safer AI development. The position requires deep technical expertise in AI capabilities combined with strategic thinking about real-world deployment risks.
Research Product Manager, Model Behaviors
Anthropic
This is a senior product management role at Anthropic focused on shaping Claude's behaviors and alignment through reinforcement learning and model finetuning. The ideal candidate combines 5+ years of conversational AI product experience with deep ML knowledge, user empathy, and the judgment to navigate nuanced AI safety and behavior tradeoffs. The role requires translating alignment research into scaled product improvements while coordinating across research, product, and safeguards teams.
Research Lead, Training Insights
Anthropic
Lead research strategy for measuring and characterizing AI model capabilities across training and deployment lifecycles at Anthropic. Drive original evaluation methodologies, lead a small team of researchers, and shape how the company evaluates and communicates model performance to internal and external stakeholders.
Research Engineer, Virtual Collaborator (Cowork)
Anthropic
Anthropic seeks a senior Research Engineer to design and implement reinforcement learning pipelines that fine-tune Claude for virtual collaborator workflows in enterprise settings. The role bridges research and product, requiring deep expertise in RL environments, reward modeling, and data generation platforms to train Claude on realistic organizational tasks while maintaining alignment with product requirements.
Research Engineer, Universes
Anthropic
Anthropic seeks a Research Engineer to design and build next-generation training environments for agentic AI systems capable of complex, long-horizon tasks. The role balances fundamental RL research with production engineering, requiring expertise in environment design, evaluation methodologies, and the ability to rapidly iterate across research and ML stacks. Ideal candidates demonstrate high impact-orientation, strong research taste, and commitment to developing safe and capable AI systems.
Research Engineer / Scientist, Alignment Science - London
Anthropic
Anthropic seeks a Research Engineer/Scientist to conduct exploratory experimental research on AI safety and alignment, focusing on understanding and steering powerful AI systems. The role involves building elegant ML experiments, stress-testing alignment under adversarial scenarios, and developing AI control methods, collaborating with interpretability and red team researchers.
Research Engineer / Scientist, Alignment Science
Anthropic
Anthropic seeks a Research Engineer/Scientist to conduct exploratory experimental research on AI safety and alignment, focusing on risks from advanced AI systems. The role combines scientific rigor with engineering expertise to build elegant experiments that help understand and steer powerful AI behavior, particularly through scalable oversight and AI control techniques. This position collaborates across interpretability, fine-tuning, and red team functions to ensure AI systems remain helpful, honest, and harmless at human-level and beyond capabilities.
Research Engineer, Reward Models Platform
Anthropic
Build infrastructure and tooling for Anthropic's reward models platform, automating research workflows around rubric development, human feedback analysis, and reward evaluation. This role bridges research and engineering, requiring strong Python fundamentals and experience with ML systems, to enable faster iteration on reward methodologies used for training AI models.
Research Engineer / Research Scientist, Vision
Anthropic
Anthropic seeks a senior Research Engineer with 7+ years of ML and computer vision expertise to advance Claude's visual and spatial reasoning capabilities. The role involves developing vision language model architectures, creating multimodal datasets and evaluations, and collaborating across teams to solve real-world customer challenges. This is a full-stack research position spanning pretraining, reinforcement learning, and deployment-time techniques.
Research Engineer/Research Scientist, Audio
Anthropic
Anthropic seeks a Research Engineer/Scientist to advance audio capabilities in large language models, focusing on speech understanding, generation, and multimodal audio integration. The role requires 50/50 split between research and engineering work across the full audio ML stack, from signal processing to large-scale model training and deployment.
Research Engineer, Production Model Post-Training
Anthropic
A hybrid research-engineering role at Anthropic focused on implementing and optimizing post-training techniques for frontier language models. The position requires strong software engineering skills combined with ML expertise to translate cutting-edge alignment research into production systems, with direct responsibility for model quality and safety.
Research Engineer, Production Model Post-Training
Anthropic
Research Engineer focused on implementing and optimizing post-training techniques for Claude models at scale, including Constitutional AI and RLHF methodologies. The role requires strong software engineering skills, distributed systems expertise, and the ability to translate cutting-edge research into production-ready implementations while managing production incidents.
Research Engineer, Performance RL
Anthropic
Research Engineer role on Anthropic's Code RL team focused on improving Claude's ability to write correct, optimized code for accelerators like GPUs. You'll design RL environments and evaluation metrics, conduct experiments, and deliver work into production training runs, requiring deep expertise in both reinforcement learning and accelerator performance optimization.
Research Engineer, Machine Learning (Reinforcement Learning)
Anthropic
Anthropic seeks a Research Engineer to advance reinforcement learning capabilities for large language models, including autonomous systems, code generation, and agentic tool use. This hybrid research-engineering role involves architecting scalable RL infrastructure, designing novel training environments and evaluations, and implementing fundamental research at production scale. The position requires both deep technical expertise in RL and strong systems engineering skills to push the boundaries of AI capabilities.
Research Engineer, Machine Learning (Reinforcement Learning)
Anthropic
Anthropic seeks a Research Engineer to advance reinforcement learning capabilities in large language models, focusing on agentic systems, tool use, and code generation. The role blends research innovation with engineering excellence, requiring design and implementation of novel RL training environments, infrastructure optimization, and collaboration across research and production teams to scale cutting-edge systems.
Research Engineer, Frontier Red Team (Autonomy)
Anthropic
This role focuses on building and evaluating autonomous AI systems to understand and defend against adversarial use of advanced AI. You'll develop model organisms of self-improving systems and create defensive agents, directly influencing Anthropic's safety research and AI policy at a critical juncture in AI development.
Research Engineer, Environment Scaling
Anthropic
This role focuses on building and scaling RL training environments for LLM adaptation across new domains and use cases. The ideal candidate combines ML research expertise in fine-tuning and reward design with strong project management and vendor relationship skills to deliver end-to-end capability improvements to Anthropic's models.