Fine-tuning Jobs
LLM and model fine-tuning engineering roles — PEFT, LoRA, RLHF, instruction tuning.
24 open positions
Tech Lead Manager - Multi Modal Foundation Models (Language)
Wayve
Lead a team developing language grounding and reasoning capabilities for Wayve's multimodal foundation models used in autonomous driving. Drive foundational research at the intersection of large-scale pretraining, language understanding, and embodied agent alignment, with focus on grounded understanding in real-world contexts.
Senior Research Scientist (Must be based in UK)
PolyAI
PolyAI is seeking a Senior Research Scientist to lead cutting-edge research on large language model post-training for conversational voice assistants. The role focuses on novel approaches including conversational reinforcement learning, audio-native LLMs, streaming turn-taking, and reasoning model distillation. This is a research-focused leadership position requiring expertise in LLMs and dialogue systems, based in the UK.
Applied Research Intern
Labelbox
Design and build evaluation systems and benchmarks for frontier LLMs and multimodal models, including reasoning, code, and agent-use tasks. Create post-training datasets and prototype RLHF/RLAIF/DPO-style training loops to measure and improve model performance on real-world tasks.
Applied Research Engineer
Labelbox
Develop cutting-edge systems for creating and leveraging high-quality human feedback data to train frontier AI models using techniques like RLHF and DPO. Design advanced methods to align human preferences with AI training processes and measure the quality and impact of human-in-the-loop data. Work at the intersection of applied research and engineering to solve critical data-centric challenges in modern AI development.
Senior Research Engineer, Post-training & Evaluation
Reddit is seeking a Senior Research Engineer to own the post-training and evaluation pipeline for their Reddit-native Large Language Models. You will architect evaluation suites, build internal benchmarks (Reddit Benchmark), and execute fine-tuning workflows to ensure models are safe, performant, and culturally aligned with Reddit communities. This role bridges applied research and massive-scale infrastructure, sitting at the core of Reddit's AI foundation.
Director of Machine Learning, Safety & Mods
Reddit seeks a Director of Machine Learning to lead safety and moderation ML initiatives, building industry-leading systems that detect and prevent harmful content at scale. The role combines strategic leadership with hands-on ML expertise in fine-tuned LLMs and transformer models, requiring cross-functional collaboration across product, engineering, and AI/ML platform teams to protect global users.
Senior Machine Learning Engineer II, NLU & Agentic AI
Moveworks
Senior Machine Learning Engineer role focused on advancing NLU and agentic AI capabilities at Moveworks. Requires expertise in LLM fine-tuning, conversational agents, compound AI systems, and production-grade ML infrastructure to build reliable enterprise copilot experiences. Strong emphasis on balancing model quality with latency, reliability, and end-to-end system performance.
Senior Machine Learning Engineer II, NLU & Agentic AI
Moveworks
Senior ML engineer role focused on building production NLU and agentic AI systems at scale. You'll work on LLM fine-tuning, reasoning strategies, multimodal agents, and compound AI system design while collaborating with annotation and product teams to deliver enterprise-grade conversational AI.
Senior Research Scientist, Reward Models
Anthropic
Lead research scientist role focused on advancing reward modeling techniques for large language models, particularly RLHF training and LLM-based evaluation methods. The position emphasizes developing novel architectures, mitigating reward specification gaming, and translating research into production improvements while driving AI alignment and safety initiatives.
Research Scientist, Societal Impacts
Anthropic
Research Scientist role focused on analyzing Claude's real-world behavior using observational tools and building evaluations to assess safety and alignment with constitutional AI principles. The position bridges research insights with model improvements across fine-tuning, safeguards, and interpretability teams while contributing to understanding societal impacts of AI systems.
Research Product Manager, Model Behaviors
Anthropic
This is a senior product management role at Anthropic focused on shaping Claude's behaviors and alignment through reinforcement learning and model finetuning. The ideal candidate combines 5+ years of conversational AI product experience with deep ML knowledge, user empathy, and the judgment to navigate nuanced AI safety and behavior tradeoffs. The role requires translating alignment research into scaled product improvements while coordinating across research, product, and safeguards teams.
Research Engineer, Virtual Collaborator (Cowork)
Anthropic
Anthropic seeks a senior Research Engineer to design and implement reinforcement learning pipelines that fine-tune Claude for virtual collaborator workflows in enterprise settings. The role bridges research and product, requiring deep expertise in RL environments, reward modeling, and data generation platforms to train Claude on realistic organizational tasks while maintaining alignment with product requirements.
Research Engineer, Reward Models Platform
Anthropic
Build infrastructure and tooling for Anthropic's reward models platform, automating research workflows around rubric development, human feedback analysis, and reward evaluation. This role bridges research and engineering, requiring strong Python fundamentals and experience with ML systems, to enable faster iteration on reward methodologies used for training AI models.
Research Engineer / Research Scientist, Vision
Anthropic
Anthropic seeks a senior Research Engineer with 7+ years of ML and computer vision expertise to advance Claude's visual and spatial reasoning capabilities. The role involves developing vision language model architectures, creating multimodal datasets and evaluations, and collaborating across teams to solve real-world customer challenges. This is a full-stack research position spanning pretraining, reinforcement learning, and deployment-time techniques.
Research Engineer/Research Scientist, Audio
Anthropic
Anthropic seeks a Research Engineer/Scientist to advance audio capabilities in large language models, focusing on speech understanding, generation, and multimodal audio integration. The role requires 50/50 split between research and engineering work across the full audio ML stack, from signal processing to large-scale model training and deployment.
Research Engineer, Production Model Post-Training
Anthropic
A hybrid research-engineering role at Anthropic focused on implementing and optimizing post-training techniques for frontier language models. The position requires strong software engineering skills combined with ML expertise to translate cutting-edge alignment research into production systems, with direct responsibility for model quality and safety.
Research Engineer, Production Model Post-Training
Anthropic
Research Engineer focused on implementing and optimizing post-training techniques for Claude models at scale, including Constitutional AI and RLHF methodologies. The role requires strong software engineering skills, distributed systems expertise, and the ability to translate cutting-edge research into production-ready implementations while managing production incidents.
Research Engineer, Environment Scaling
Anthropic
This role focuses on building and scaling RL training environments for LLM adaptation across new domains and use cases. The ideal candidate combines ML research expertise in fine-tuning and reward design with strong project management and vendor relationship skills to deliver end-to-end capability improvements to Anthropic's models.
Research Engineer, Agents
Anthropic
This role focuses on advancing Claude's capabilities as an autonomous agent by designing novel agent architectures, building rigorous evaluation benchmarks, and optimizing training data for agentic tasks. You'll work on core challenges in agent harness design, long-horizon task execution, and multi-agent coordination while collaborating across research and product teams.
Privacy Research Engineer, Safeguards
Anthropic
Anthropic seeks a Privacy Research Engineer to design and implement privacy-preserving techniques for large language models, audit current approaches, and establish privacy policies. The role requires deep expertise in both privacy-preserving ML and LLM training, with a track record of shipping products in fast-paced environments and leading cross-functional security initiatives.
Machine Learning Systems Engineer, RL Engineering
Anthropic
Build and optimize the algorithms and infrastructure that power Anthropic's reinforcement learning pipeline for training Claude models. Focus on improving performance, reliability, and usability of RLHF and related finetuning systems that enable rapid research iteration. This is a systems-oriented engineering role requiring 4+ years of software experience and expertise in large-scale distributed systems.
Machine Learning Systems Engineer, Research Tools
Anthropic
This role focuses on designing and optimizing tokenization and encoding systems that bridge Anthropic's pretraining and finetuning workflows. The engineer will build critical ML infrastructure, debug data processing pipelines, and enable researchers to experiment with novel encoding approaches while ensuring system reliability and performance.
Head of Solutions Architects, Applied AI (Korea)
Anthropic
Founding leadership role establishing Anthropic's Applied AI Solutions Architecture practice in Korea, responsible for building and managing a technical team while driving enterprise adoption of Claude products. The role combines deep technical expertise in LLM systems with consultative sales experience, strategic partnerships, and team leadership to enable large-scale AI transformation across Korean enterprises.
[Expression of Interest] Research Scientist / Engineer, Honesty
Anthropic
Anthropic seeks a Research Scientist/Engineer to develop techniques for improving honesty and minimizing hallucinations in large language models within their Finetuning Alignment team. The role focuses on creating data pipelines, classifiers, benchmarks, and reinforcement learning systems to ensure models are accurate, well-calibrated, and truthful across domains. Candidates should have MS/PhD in CS/ML, strong Python skills, and industry experience with language model finetuning and evaluation.