Reinforcement Learning Jobs

RL and RLHF engineering roles at robotics, gaming, and AI labs.

50 open positions

Research Scientist (Embodied AI & World Models)

Graphcore

Graphcore seeks an experienced Research Scientist to advance embodied AI and world models for edge/low-power scenarios including robotics and autonomous driving. The role involves developing hardware-aware AI algorithms and deploying multimodal models while contributing to fundamental and applied research published at top-tier ML conferences. You'll join a collaborative research team across UK locations working on efficient compute, model scaling, and next-generation AI architectures.

COMPUTER VISIONMACHINE LEARNINGREINFORCEMENT LEARNINGSYSTEM DESIGN

Research Engineer

Graphcore

Bristol, UK; Cambridge, UK; London, UKmid19h ago

Graphcore seeks a Research Engineer to advance AI compute through hardware-aware algorithms and implementations. The role combines machine learning expertise with strong software engineering and performance optimization skills to deliver impactful research across efficient training/inference, world models, and reinforcement learning. You'll collaborate with researchers to translate ideas into scalable implementations and contribute to publications at leading AI conferences.

MACHINE LEARNINGREINFORCEMENT LEARNINGSYSTEM DESIGN

Technical Product Manager - Robotaxi

Wayve

Sunnyvalesenior19h ago

Technical Product Manager role at Wayve focused on defining and delivering Level 4 autonomous driving solutions powered by embodied AI. Requires deep expertise in autonomous vehicle systems, cross-functional collaboration with R&D and safety teams, and ability to translate cutting-edge AI innovation into market-ready products while managing customer relationships and regulatory frameworks.

COMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGNMACHINE LEARNING

Tech Lead Manager - Multi Modal Foundation Models (Language)

Wayve

Sunnyvalesenior19h ago

Lead a team developing language grounding and reasoning capabilities for Wayve's multimodal foundation models used in autonomous driving. Drive foundational research at the intersection of large-scale pretraining, language understanding, and embodied agent alignment, with focus on grounded understanding in real-world contexts.

NLP LLMSMACHINE LEARNINGCOMPUTER VISIONFINE TUNING+2

Tech Lead - AI Validation Systems Engineer

Wayve

Sunnyvalesenior19h ago

Tech Lead role at Wayve focused on building AI validation systems for autonomous driving. Responsible for designing test scenarios and safety assessments for end-to-end learned driving models, while leading a team of validation engineers. Bridges AI/ML expertise with systems engineering to establish trust and safety standards for the autonomous vehicle industry.

MACHINE LEARNINGCOMPUTER VISIONREINFORCEMENT LEARNINGEVALS+1

Strategic Partnership Manager

Wayve

Londonmid19h ago

Strategic Partnership Manager role at Wayve focused on commercializing autonomous driving technology through robotaxi deployments in London and Tokyo. The position requires managing partnerships with ride-hailing platforms and automotive OEMs while coordinating cross-functional teams to execute pilot launches and scale fleet operations. This is a business-focused role bridging Embodied AI technology with real-world mobility applications.

COMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGN

Software Engineer - OS & Kernel, Robot Software

Wayve

Sunnyvalemid19h ago

Join Wayve's OS & Kernel team to develop and maintain a custom Linux distribution powering their autonomous vehicle fleet. You'll work on core system infrastructure that enables rapid iteration for model developers and ensures reliable on-road experimentation for autonomous driving capabilities. This role requires deep expertise in Linux kernel development, Yocto, and embedded systems to support Wayve's mission of advancing embodied AI technology.

SYSTEM DESIGNCOMPUTER VISIONREINFORCEMENT LEARNING

Robotaxi Technical Operations

Wayve

Londonsenior19h ago

This role bridges technical AI/autonomous driving development with field operations at Wayve, requiring someone to lead testing programs, demonstration campaigns, and on-road validation for robotaxi systems. The position demands senior-level experience translating complex technical strategies into executable operations while maintaining feedback loops between field teams and product development.

COMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGN

Robotaxi Safety Programme Lead

Wayve

London; Sunnyvalesenior19h ago

Lead the safety programme for Wayve's robotaxi deployment across multiple cities, translating AI safety approaches and compliance requirements into scalable, structured programmes. Own day-to-day execution while coordinating across engineering, operations, legal, and partner teams to ensure credible and defensible safety operations at scale.

COMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGN

Register your interest for Wayve in Germany

Wayve

Germanymid19h ago

Wayve is expanding to Germany and seeking talented individuals to join their autonomous driving AI team. The company develops embodied AI technology and foundation models that enable vehicles to perceive and navigate complex environments. This is an interest registration for upcoming opportunities in their new German office.

COMPUTER VISIONMACHINE LEARNINGREINFORCEMENT LEARNING

Principal Machine Learning Engineer, App SW

Wayve

Sunnyvaleprincipal19h ago

Principal ML Engineer role focused on developing state-of-the-art driving models for autonomous vehicles, spanning model architecture, data pipelines, and real-world deployment. Responsibilities include leading personalized and collaborative driving projects while collaborating across AI Platform, Simulation, and Robot Software teams to deliver scalable, production-ready systems.

MACHINE LEARNINGCOMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGN+1

Operational Safety Coordinator

Wayve

Japanjunior19h ago

Wayve seeks an Operational Safety Coordinator to support autonomous vehicle fleet testing operations, monitoring safety performance, managing incidents, and maintaining compliance documentation. The role requires 2+ years in safety/operations with knowledge of ADAS and autonomous vehicle systems, reporting to the Operational Safety Manager in a safety-critical position.

COMPUTER VISIONREINFORCEMENT LEARNING

Machine Learning Manager, App SW

Wayve

Japansenior19h ago

Wayve seeks an experienced technical leader to establish and manage a Japan-based Application Engineering team focused on localizing autonomous driving technology for the Japanese market. The role requires expertise in machine learning, computer vision, and robotics with strong leadership capabilities to operate independently across time zones while delivering tailored AV solutions for Japan's unique infrastructure and regulatory environment.

MACHINE LEARNINGCOMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGN

Machine Learning Engineer, AV Engineering

Wayve

Israelsenior19h ago

Senior/Staff ML Engineer role focused on building end-to-end AI driver models for autonomous vehicles (L2-L4), with ownership of the full training lifecycle from data curation through on-road validation. The position emphasizes product-driven execution with direct impact on commercial deployments like the Nissan MVP, requiring expertise in safety-critical machine learning and rapid iteration cycles.

MACHINE LEARNINGCOMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGN

Field Engineer

Wayve

Israeljunior19h ago

A hands-on Field Engineer role at Wayve focused on diagnosing and resolving vehicle software and hardware issues in autonomous driving systems. The position requires rapid fault triage, on-road testing support, and collaboration with safety teams to keep the fleet operational while contributing real-world feedback to improve AI performance.

COMPUTER VISIONSYSTEM DESIGNREINFORCEMENT LEARNING

Customer Program Manager

Wayve

Germanymid19h ago

This is a program management role at Wayve, an autonomous driving AI company, requiring leadership of OEM customer programs from development through production. The role demands deep automotive and embodied AI knowledge, along with strong project management capabilities including risk and change management across complex cross-functional initiatives.

COMPUTER VISIONREINFORCEMENT LEARNINGSYSTEM DESIGN

Customer Integration Engineer

Wayve

Detroitmid19h ago

Customer Integration Engineer at Wayve supporting autonomous driving AI technology deployment with customers. The role requires hands-on technical expertise in vehicle integration, real-time diagnostics, and serving as a bridge between customer engineering teams and internal Wayve engineers to ensure seamless deployment of their Embodied AI software.

COMPUTER VISIONSYSTEM DESIGNREINFORCEMENT LEARNING

Customer Integration Engineer

Wayve

Japanmid19h ago

Customer Integration Engineer supporting Wayve's autonomous driving AI technology through hands-on technical assistance during vehicle integration and validation phases. Acts as the critical bridge between customer engineering teams and Wayve's internal teams, performing real-time diagnostics and troubleshooting of AI system integration issues in complex automotive environments.

COMPUTER VISIONSYSTEM DESIGNREINFORCEMENT LEARNING

Applied Scientist, Controllable GAIA

Wayve

Londonmid19h ago

Join Wayve's Science team to develop controllable generative world models (GAIA) that simulate realistic multi-sensor environments for autonomous vehicle training. You'll advance next-generation autonomous driving by creating photoreal, physics-aware simulations that accelerate AI development beyond collecting real-world miles.

COMPUTER VISIONMACHINE LEARNINGREINFORCEMENT LEARNING

Application Software Engineer - Software Integration / Embedded Software

Wayve

Sunnyvalemid19h ago

This role focuses on bringing up Wayve's autonomous driving AI stack on diverse automotive hardware platforms (NVIDIA Drive, Qualcomm Ride) and operating systems (Linux, QNX, AUTOSAR). The engineer will handle software porting, driver integration, and hardware validation for US market deployment, requiring strong embedded systems and systems integration expertise.

COMPUTER VISIONSYSTEM DESIGNREINFORCEMENT LEARNING

Senior Research Scientist (Must be based in UK)

PolyAI

London, United Kingdomsenior19h ago

PolyAI is seeking a Senior Research Scientist to lead cutting-edge research on large language model post-training for conversational voice assistants. The role focuses on novel approaches including conversational reinforcement learning, audio-native LLMs, streaming turn-taking, and reasoning model distillation. This is a research-focused leadership position requiring expertise in LLMs and dialogue systems, based in the UK.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS

Applied Research Intern

Labelbox

San Francisco Bay Areajunior19h ago

Design and build evaluation systems and benchmarks for frontier LLMs and multimodal models, including reasoning, code, and agent-use tasks. Create post-training datasets and prototype RLHF/RLAIF/DPO-style training loops to measure and improve model performance on real-world tasks.

NLP LLMSEVALSFINE TUNINGREINFORCEMENT LEARNING

Applied Research Engineer

Labelbox

San Francisco Bay Areamid19h ago

Develop cutting-edge systems for creating and leveraging high-quality human feedback data to train frontier AI models using techniques like RLHF and DPO. Design advanced methods to align human preferences with AI training processes and measure the quality and impact of human-in-the-loop data. Work at the intersection of applied research and engineering to solve critical data-centric challenges in modern AI development.

NLP LLMSREINFORCEMENT LEARNINGFINE TUNINGEVALS+1

Senior Staff Machine Learning Engineer, Ads Quality

Instacart

United States - RemoteRemotestaff19h ago

Senior Staff ML Engineer role focused on leading Ads Quality initiatives at Instacart, requiring expertise in AI/ML models and optimization for a 4-sided marketplace. The position combines individual contribution with technical leadership, mentoring, and cross-functional collaboration to drive next-generation advertising solutions at scale.

MACHINE LEARNINGSYSTEM DESIGNREINFORCEMENT LEARNING

Senior Machine Learning Engineer, Operations Research

Instacart

United States - RemoteRemotesenior19h ago

Senior ML Engineer role focused on Operations Research at Instacart's Logistics team, solving complex fulfillment problems including order batching, shopper routing, and real-time assignment. The position combines combinatorial optimization and mathematical programming with ML to optimize a multi-sided marketplace for customers, shoppers, and retailers at scale.

MACHINE LEARNINGMATH FOR AIREINFORCEMENT LEARNINGSYSTEM DESIGN

Senior Machine Learning Engineer II, Search & Recommendations Ranking

Instacart

US - RemoteRemotestaff19h ago

Lead the development of multi-task, multi-objective ranking models that unify search, recommendations, and ads across Instacart's platform. Build foundational ML systems optimized for incremental value (GTV, basket lift, retention) with LLM-enhanced retrieval and value-aware constraints. Partner with engineers and PMs to deploy ranking backbones powering the entire shopping experience at scale.

MACHINE LEARNINGNLP LLMSREINFORCEMENT LEARNINGSYSTEM DESIGN

Senior Applied Scientist II, Ads Optimization

Instacart

United States - RemoteRemotesenior19h ago

Lead the algorithmic design of Instacart's real-time bidding and budget optimization systems that handle millions of daily decisions for their $1B+ ads business. Apply control theory, constrained optimization, and auction economics to balance advertiser goals, user experience, and platform revenue. Own systems end-to-end from mathematical formulation through production deployment and impact measurement.

MACHINE LEARNINGREINFORCEMENT LEARNINGMATH FOR AISYSTEM DESIGN

Machine Learning Manager, Feed Relevance

Remote - United StatesRemotesenior19h ago

Lead a high-impact ML engineering team at Reddit responsible for feed relevance, ranking, and content discovery systems serving 120M+ daily users. Define technical vision and strategy for sophisticated relevance modeling while coaching team development and collaborating across product, infrastructure, and business functions. This is a technical management role requiring deep ML expertise combined with proven leadership capabilities in shipping production recommendation systems.

MACHINE LEARNINGSYSTEM DESIGNREINFORCEMENT LEARNING

Machine Learning Engineer

New York City, NYmid19h ago

Reddit seeks a Machine Learning Engineer to build industrial-grade models for critical ML tasks, specializing in recommender systems and NLP with advanced neural networks and feature engineering. The role requires 3+ years of experience with expertise in Python, SQL, distributed computing (Spark/Beam), and cloud platforms (GCP/AWS), with optional full-time remote work available.

MACHINE LEARNINGNLP LLMSSQL DATA ENGREINFORCEMENT LEARNING

Senior Software Engineer, Chess

Duolingo

New York, NYsenior19h ago

Senior Software Engineer role building Duolingo's chess learning platform from the ground up. The position involves designing backend services, collaborating with AI researchers to create adaptive learning experiences, and leveraging chess domain knowledge to shape product features. Ideal candidate combines strong backend engineering skills with active chess playing experience and interest in AI-driven educational products.

REINFORCEMENT LEARNINGSYSTEM DESIGNMACHINE LEARNING

Senior Machine Learning Engineer II, NLU & Agentic AI

Moveworks

Mountain View, CAstaff19h ago

Senior Machine Learning Engineer role focused on advancing NLU and agentic AI capabilities at Moveworks. Requires expertise in LLM fine-tuning, conversational agents, compound AI systems, and production-grade ML infrastructure to build reliable enterprise copilot experiences. Strong emphasis on balancing model quality with latency, reliability, and end-to-end system performance.

NLP LLMSAGENTSFINE TUNINGEVALS+2

Sr. Director of Product, Research and Training Infrastructure

CoreWeave

Livingston, NJ / New York, NY / Sunnyvale, CA / Bellevue, WAprincipal19h ago

Sr. Director of Product role leading the Research Training Infrastructure stack at CoreWeave, responsible for product strategy and engineering execution of orchestration tools (Slurm on Kubernetes) powering frontier AI model training and post-training at scale. This executive position bridges HPC infrastructure and cloud-native tools to enable leading AI research labs to build and iterate on cutting-edge models efficiently.

MLOPS INFRAREINFORCEMENT LEARNINGNLP LLMSSYSTEM DESIGN

Senior Research Scientist, Reward Models

Anthropic

Remote-Friendly (Travel Required) | San Francisco, CARemotesenior2d ago

Lead research scientist role focused on advancing reward modeling techniques for large language models, particularly RLHF training and LLM-based evaluation methods. The position emphasizes developing novel architectures, mitigating reward specification gaming, and translating research into production improvements while driving AI alignment and safety initiatives.

NLP LLMSREINFORCEMENT LEARNINGFINE TUNINGEVALS+1

Research Scientist, Frontier Red Team (Emerging Risks)

Anthropic

San Francisco, CAsenior2d ago

This role involves building an independent research program to identify and evaluate emerging societal risks from advanced AI systems, with focus on integration risks rather than catastrophic scenarios. You will conduct red team evaluations, design experiments, and translate findings into actionable insights for safer AI development. The position requires deep technical expertise in AI capabilities combined with strategic thinking about real-world deployment risks.

NLP LLMSAGENTSEVALSREINFORCEMENT LEARNING

Research Product Manager, Model Behaviors

Anthropic

San Francisco, CA | New York City, NYsenior2d ago

This is a senior product management role at Anthropic focused on shaping Claude's behaviors and alignment through reinforcement learning and model finetuning. The ideal candidate combines 5+ years of conversational AI product experience with deep ML knowledge, user empathy, and the judgment to navigate nuanced AI safety and behavior tradeoffs. The role requires translating alignment research into scaled product improvements while coordinating across research, product, and safeguards teams.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS

Research Lead, Training Insights

Anthropic

Remote-Friendly (Travel Required) | San Francisco, CA; San Francisco, CA | New York City, NYRemotesenior2d ago

Lead research strategy for measuring and characterizing AI model capabilities across training and deployment lifecycles at Anthropic. Drive original evaluation methodologies, lead a small team of researchers, and shape how the company evaluates and communicates model performance to internal and external stakeholders.

NLP LLMSEVALSREINFORCEMENT LEARNINGMACHINE LEARNING+1

Research Engineer, Virtual Collaborator (Cowork)

Anthropic

New York City, NY; San Francisco, CA; Seattle, WAsenior2d ago

Anthropic seeks a senior Research Engineer to design and implement reinforcement learning pipelines that fine-tune Claude for virtual collaborator workflows in enterprise settings. The role bridges research and product, requiring deep expertise in RL environments, reward modeling, and data generation platforms to train Claude on realistic organizational tasks while maintaining alignment with product requirements.

NLP LLMSREINFORCEMENT LEARNINGFINE TUNINGEVALS+1

Research Engineer, Universes

Anthropic

Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NYRemotemid2d ago

Anthropic seeks a Research Engineer to design and build next-generation training environments for agentic AI systems capable of complex, long-horizon tasks. The role balances fundamental RL research with production engineering, requiring expertise in environment design, evaluation methodologies, and the ability to rapidly iterate across research and ML stacks. Ideal candidates demonstrate high impact-orientation, strong research taste, and commitment to developing safe and capable AI systems.

REINFORCEMENT LEARNINGAGENTSEVALSMACHINE LEARNING+1

Research Engineer / Scientist, Alignment Science - London

Anthropic

London, UKmid2d ago

Anthropic seeks a Research Engineer/Scientist to conduct exploratory experimental research on AI safety and alignment, focusing on understanding and steering powerful AI systems. The role involves building elegant ML experiments, stress-testing alignment under adversarial scenarios, and developing AI control methods, collaborating with interpretability and red team researchers.

MACHINE LEARNINGREINFORCEMENT LEARNINGNLP LLMSEVALS

Research Engineer / Scientist, Alignment Science

Anthropic

San Francisco, CAmid2d ago

Anthropic seeks a Research Engineer/Scientist to conduct exploratory experimental research on AI safety and alignment, focusing on risks from advanced AI systems. The role combines scientific rigor with engineering expertise to build elegant experiments that help understand and steer powerful AI behavior, particularly through scalable oversight and AI control techniques. This position collaborates across interpretability, fine-tuning, and red team functions to ensure AI systems remain helpful, honest, and harmless at human-level and beyond capabilities.

NLP LLMSMACHINE LEARNINGREINFORCEMENT LEARNINGEVALS

Research Engineer, Reward Models Platform

Anthropic

Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NYRemotemid2d ago

Build infrastructure and tooling for Anthropic's reward models platform, automating research workflows around rubric development, human feedback analysis, and reward evaluation. This role bridges research and engineering, requiring strong Python fundamentals and experience with ML systems, to enable faster iteration on reward methodologies used for training AI models.

FINE TUNINGREINFORCEMENT LEARNINGEVALSMLOPS INFRA+1

Research Engineer / Research Scientist, Vision

Anthropic

New York City, NY; San Francisco, CA; Seattle, WAsenior2d ago

Anthropic seeks a senior Research Engineer with 7+ years of ML and computer vision expertise to advance Claude's visual and spatial reasoning capabilities. The role involves developing vision language model architectures, creating multimodal datasets and evaluations, and collaborating across teams to solve real-world customer challenges. This is a full-stack research position spanning pretraining, reinforcement learning, and deployment-time techniques.

COMPUTER VISIONNLP LLMSMACHINE LEARNINGREINFORCEMENT LEARNING+2

Research Engineer/Research Scientist, Audio

Anthropic

San Francisco, CAmid2d ago

Anthropic seeks a Research Engineer/Scientist to advance audio capabilities in large language models, focusing on speech understanding, generation, and multimodal audio integration. The role requires 50/50 split between research and engineering work across the full audio ML stack, from signal processing to large-scale model training and deployment.

NLP LLMSMACHINE LEARNINGFINE TUNINGREINFORCEMENT LEARNING

Research Engineer, Production Model Post-Training

Anthropic

Zürich, CHmid2d ago

A hybrid research-engineering role at Anthropic focused on implementing and optimizing post-training techniques for frontier language models. The position requires strong software engineering skills combined with ML expertise to translate cutting-edge alignment research into production systems, with direct responsibility for model quality and safety.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS+1

Research Engineer, Production Model Post-Training

Anthropic

San Francisco, CA | New York City, NY | Seattle, WAmid2d ago

Research Engineer focused on implementing and optimizing post-training techniques for Claude models at scale, including Constitutional AI and RLHF methodologies. The role requires strong software engineering skills, distributed systems expertise, and the ability to translate cutting-edge research into production-ready implementations while managing production incidents.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGMLOPS INFRA+2

Research Engineer, Performance RL

Anthropic

San Francisco, CAmid2d ago

Research Engineer role on Anthropic's Code RL team focused on improving Claude's ability to write correct, optimized code for accelerators like GPUs. You'll design RL environments and evaluation metrics, conduct experiments, and deliver work into production training runs, requiring deep expertise in both reinforcement learning and accelerator performance optimization.

REINFORCEMENT LEARNINGNLP LLMSSYSTEM DESIGN

Research Engineer, Machine Learning (Reinforcement Learning)

Anthropic

San Francisco, CA | New York City, NYmid2d ago

Anthropic seeks a Research Engineer to advance reinforcement learning capabilities for large language models, including autonomous systems, code generation, and agentic tool use. This hybrid research-engineering role involves architecting scalable RL infrastructure, designing novel training environments and evaluations, and implementing fundamental research at production scale. The position requires both deep technical expertise in RL and strong systems engineering skills to push the boundaries of AI capabilities.

REINFORCEMENT LEARNINGNLP LLMSAGENTSMLOPS INFRA+2

Research Engineer, Machine Learning (Reinforcement Learning)

Anthropic

London, UKmid2d ago

Anthropic seeks a Research Engineer to advance reinforcement learning capabilities in large language models, focusing on agentic systems, tool use, and code generation. The role blends research innovation with engineering excellence, requiring design and implementation of novel RL training environments, infrastructure optimization, and collaboration across research and production teams to scale cutting-edge systems.

REINFORCEMENT LEARNINGNLP LLMSAGENTSMLOPS INFRA+2

Research Engineer, Frontier Red Team (Autonomy)

Anthropic

San Francisco, CAmid2d ago

This role focuses on building and evaluating autonomous AI systems to understand and defend against adversarial use of advanced AI. You'll develop model organisms of self-improving systems and create defensive agents, directly influencing Anthropic's safety research and AI policy at a critical juncture in AI development.

AGENTSREINFORCEMENT LEARNINGEVALSSYSTEM DESIGN+1

Research Engineer, Environment Scaling

Anthropic

Remote-Friendly (Travel Required) | San Francisco, CARemotemid2d ago

This role focuses on building and scaling RL training environments for LLM adaptation across new domains and use cases. The ideal candidate combines ML research expertise in fine-tuning and reward design with strong project management and vendor relationship skills to deliver end-to-end capability improvements to Anthropic's models.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS