Fine-tuning Jobs

LLM and model fine-tuning engineering roles — PEFT, LoRA, RLHF, instruction tuning.

24 open positions

Tech Lead Manager - Multi Modal Foundation Models (Language)

Wayve

Lead a team developing language grounding and reasoning capabilities for Wayve's multimodal foundation models used in autonomous driving. Drive foundational research at the intersection of large-scale pretraining, language understanding, and embodied agent alignment, with focus on grounded understanding in real-world contexts.

NLP LLMSMACHINE LEARNINGCOMPUTER VISIONFINE TUNING+2

Senior Research Scientist (Must be based in UK)

PolyAI

London, United Kingdomsenior19h ago

PolyAI is seeking a Senior Research Scientist to lead cutting-edge research on large language model post-training for conversational voice assistants. The role focuses on novel approaches including conversational reinforcement learning, audio-native LLMs, streaming turn-taking, and reasoning model distillation. This is a research-focused leadership position requiring expertise in LLMs and dialogue systems, based in the UK.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS

Applied Research Intern

Labelbox

San Francisco Bay Areajunior19h ago

Design and build evaluation systems and benchmarks for frontier LLMs and multimodal models, including reasoning, code, and agent-use tasks. Create post-training datasets and prototype RLHF/RLAIF/DPO-style training loops to measure and improve model performance on real-world tasks.

NLP LLMSEVALSFINE TUNINGREINFORCEMENT LEARNING

Applied Research Engineer

Labelbox

San Francisco Bay Areamid19h ago

Develop cutting-edge systems for creating and leveraging high-quality human feedback data to train frontier AI models using techniques like RLHF and DPO. Design advanced methods to align human preferences with AI training processes and measure the quality and impact of human-in-the-loop data. Work at the intersection of applied research and engineering to solve critical data-centric challenges in modern AI development.

NLP LLMSREINFORCEMENT LEARNINGFINE TUNINGEVALS+1

Senior Research Engineer, Post-training & Evaluation

Remote - United StatesRemotesenior19h ago

Reddit is seeking a Senior Research Engineer to own the post-training and evaluation pipeline for their Reddit-native Large Language Models. You will architect evaluation suites, build internal benchmarks (Reddit Benchmark), and execute fine-tuning workflows to ensure models are safe, performant, and culturally aligned with Reddit communities. This role bridges applied research and massive-scale infrastructure, sitting at the core of Reddit's AI foundation.

NLP LLMSFINE TUNINGEVALSMLOPS INFRA

Director of Machine Learning, Safety & Mods

Remote - United StatesRemoteprincipal19h ago

Reddit seeks a Director of Machine Learning to lead safety and moderation ML initiatives, building industry-leading systems that detect and prevent harmful content at scale. The role combines strategic leadership with hands-on ML expertise in fine-tuned LLMs and transformer models, requiring cross-functional collaboration across product, engineering, and AI/ML platform teams to protect global users.

MACHINE LEARNINGNLP LLMSFINE TUNINGMLOPS INFRA+2

Senior Machine Learning Engineer II, NLU & Agentic AI

Moveworks

Mountain View, CAstaff19h ago

Senior Machine Learning Engineer role focused on advancing NLU and agentic AI capabilities at Moveworks. Requires expertise in LLM fine-tuning, conversational agents, compound AI systems, and production-grade ML infrastructure to build reliable enterprise copilot experiences. Strong emphasis on balancing model quality with latency, reliability, and end-to-end system performance.

NLP LLMSAGENTSFINE TUNINGEVALS+2

Senior Machine Learning Engineer II, NLU & Agentic AI

Moveworks

San Francisco, CAsenior19h ago

Senior ML engineer role focused on building production NLU and agentic AI systems at scale. You'll work on LLM fine-tuning, reasoning strategies, multimodal agents, and compound AI system design while collaborating with annotation and product teams to deliver enterprise-grade conversational AI.

NLP LLMSAGENTSFINE TUNINGEVALS+2

Senior Research Scientist, Reward Models

Anthropic

Remote-Friendly (Travel Required) | San Francisco, CARemotesenior2d ago

Lead research scientist role focused on advancing reward modeling techniques for large language models, particularly RLHF training and LLM-based evaluation methods. The position emphasizes developing novel architectures, mitigating reward specification gaming, and translating research into production improvements while driving AI alignment and safety initiatives.

NLP LLMSREINFORCEMENT LEARNINGFINE TUNINGEVALS+1

Research Scientist, Societal Impacts

Anthropic

San Francisco, CAjunior|senior2d ago

Research Scientist role focused on analyzing Claude's real-world behavior using observational tools and building evaluations to assess safety and alignment with constitutional AI principles. The position bridges research insights with model improvements across fine-tuning, safeguards, and interpretability teams while contributing to understanding societal impacts of AI systems.

NLP LLMSEVALSMACHINE LEARNINGFINE TUNING

Research Product Manager, Model Behaviors

Anthropic

San Francisco, CA | New York City, NYsenior2d ago

This is a senior product management role at Anthropic focused on shaping Claude's behaviors and alignment through reinforcement learning and model finetuning. The ideal candidate combines 5+ years of conversational AI product experience with deep ML knowledge, user empathy, and the judgment to navigate nuanced AI safety and behavior tradeoffs. The role requires translating alignment research into scaled product improvements while coordinating across research, product, and safeguards teams.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS

Research Engineer, Virtual Collaborator (Cowork)

Anthropic

New York City, NY; San Francisco, CA; Seattle, WAsenior2d ago

Anthropic seeks a senior Research Engineer to design and implement reinforcement learning pipelines that fine-tune Claude for virtual collaborator workflows in enterprise settings. The role bridges research and product, requiring deep expertise in RL environments, reward modeling, and data generation platforms to train Claude on realistic organizational tasks while maintaining alignment with product requirements.

NLP LLMSREINFORCEMENT LEARNINGFINE TUNINGEVALS+1

Research Engineer, Reward Models Platform

Anthropic

Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NYRemotemid2d ago

Build infrastructure and tooling for Anthropic's reward models platform, automating research workflows around rubric development, human feedback analysis, and reward evaluation. This role bridges research and engineering, requiring strong Python fundamentals and experience with ML systems, to enable faster iteration on reward methodologies used for training AI models.

FINE TUNINGREINFORCEMENT LEARNINGEVALSMLOPS INFRA+1

Research Engineer / Research Scientist, Vision

Anthropic

New York City, NY; San Francisco, CA; Seattle, WAsenior2d ago

Anthropic seeks a senior Research Engineer with 7+ years of ML and computer vision expertise to advance Claude's visual and spatial reasoning capabilities. The role involves developing vision language model architectures, creating multimodal datasets and evaluations, and collaborating across teams to solve real-world customer challenges. This is a full-stack research position spanning pretraining, reinforcement learning, and deployment-time techniques.

COMPUTER VISIONNLP LLMSMACHINE LEARNINGREINFORCEMENT LEARNING+2

Research Engineer/Research Scientist, Audio

Anthropic

San Francisco, CAmid2d ago

Anthropic seeks a Research Engineer/Scientist to advance audio capabilities in large language models, focusing on speech understanding, generation, and multimodal audio integration. The role requires 50/50 split between research and engineering work across the full audio ML stack, from signal processing to large-scale model training and deployment.

NLP LLMSMACHINE LEARNINGFINE TUNINGREINFORCEMENT LEARNING

Research Engineer, Production Model Post-Training

Anthropic

Zürich, CHmid2d ago

A hybrid research-engineering role at Anthropic focused on implementing and optimizing post-training techniques for frontier language models. The position requires strong software engineering skills combined with ML expertise to translate cutting-edge alignment research into production systems, with direct responsibility for model quality and safety.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS+1

Research Engineer, Production Model Post-Training

Anthropic

San Francisco, CA | New York City, NY | Seattle, WAmid2d ago

Research Engineer focused on implementing and optimizing post-training techniques for Claude models at scale, including Constitutional AI and RLHF methodologies. The role requires strong software engineering skills, distributed systems expertise, and the ability to translate cutting-edge research into production-ready implementations while managing production incidents.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGMLOPS INFRA+2

Research Engineer, Environment Scaling

Anthropic

Remote-Friendly (Travel Required) | San Francisco, CARemotemid2d ago

This role focuses on building and scaling RL training environments for LLM adaptation across new domains and use cases. The ideal candidate combines ML research expertise in fine-tuning and reward design with strong project management and vendor relationship skills to deliver end-to-end capability improvements to Anthropic's models.

NLP LLMSFINE TUNINGREINFORCEMENT LEARNINGEVALS

Research Engineer, Agents

Anthropic

Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NYRemotemid2d ago

This role focuses on advancing Claude's capabilities as an autonomous agent by designing novel agent architectures, building rigorous evaluation benchmarks, and optimizing training data for agentic tasks. You'll work on core challenges in agent harness design, long-horizon task execution, and multi-agent coordination while collaborating across research and product teams.

AGENTSNLP LLMSEVALSFINE TUNING

Privacy Research Engineer, Safeguards

Anthropic

San Francisco, CAsenior2d ago

Anthropic seeks a Privacy Research Engineer to design and implement privacy-preserving techniques for large language models, audit current approaches, and establish privacy policies. The role requires deep expertise in both privacy-preserving ML and LLM training, with a track record of shipping products in fast-paced environments and leading cross-functional security initiatives.

NLP LLMSMACHINE LEARNINGFINE TUNINGEVALS

Machine Learning Systems Engineer, RL Engineering

Anthropic

San Francisco, CA | New York City, NY | Seattle, WAmid2d ago

Build and optimize the algorithms and infrastructure that power Anthropic's reinforcement learning pipeline for training Claude models. Focus on improving performance, reliability, and usability of RLHF and related finetuning systems that enable rapid research iteration. This is a systems-oriented engineering role requiring 4+ years of software experience and expertise in large-scale distributed systems.

REINFORCEMENT LEARNINGFINE TUNINGMLOPS INFRASYSTEM DESIGN

Machine Learning Systems Engineer, Research Tools

Anthropic

San Francisco, CA | New York City, NY | Seattle, WAmid2d ago

This role focuses on designing and optimizing tokenization and encoding systems that bridge Anthropic's pretraining and finetuning workflows. The engineer will build critical ML infrastructure, debug data processing pipelines, and enable researchers to experiment with novel encoding approaches while ensuring system reliability and performance.

NLP LLMSMLOPS INFRAFINE TUNINGSYSTEM DESIGN

Head of Solutions Architects, Applied AI (Korea)

Anthropic

Seoul, South Koreaprincipal2d ago

Founding leadership role establishing Anthropic's Applied AI Solutions Architecture practice in Korea, responsible for building and managing a technical team while driving enterprise adoption of Claude products. The role combines deep technical expertise in LLM systems with consultative sales experience, strategic partnerships, and team leadership to enable large-scale AI transformation across Korean enterprises.

NLP LLMSFINE TUNINGSYSTEM DESIGNEVALS

[Expression of Interest] Research Scientist / Engineer, Honesty

Anthropic

New York City, NY; San Francisco, CAmid2d ago

Anthropic seeks a Research Scientist/Engineer to develop techniques for improving honesty and minimizing hallucinations in large language models within their Finetuning Alignment team. The role focuses on creating data pipelines, classifiers, benchmarks, and reinforcement learning systems to ensure models are accurate, well-calibrated, and truthful across domains. Candidates should have MS/PhD in CS/ML, strong Python skills, and industry experience with language model finetuning and evaluation.

NLP LLMSFINE TUNINGEVALSREINFORCEMENT LEARNING+1