Applied Reinforcement Learning Engineer Job Redmond area,Washington USA,IT/Tech

Location:
Palo Alto, CA or Seattle, WA (Hybrid/Remote)

Salary: $150K – $300K Annually

About Centific

Centific is a frontier AI data foundry that curates diverse, high-quality data, using our purpose-built technology platforms to empower the Magnificent Seven and our enterprise clients with safe, scalable AI deployment. Our team includes more than 150 PhDs and data scientists, along with 4,000+ AI practitioners and engineers, and an integrated ecosystem of 1.8 million vertical domain experts across 230+ markets.

Our zero-distance innovation™ solutions for GenAI can reduce GenAI costs by up to 80% and bring solutions to market 50% faster.

About the Team

Centific AI Research advances foundational AI models and applications through reinforcement learning, alignment, and human-centered intelligence. We're building governed simulation environments that let enterprises safely iterate and improve AI agent workflows — bridging human-labeled signal creation with automated post-training for high-stakes operations.

The Role

You’ll build simulation environments that mirror real enterprise workflows and post-train LLM agents inside them. Your environments, reward functions, and verifiers become the training ground for production agents handling document processing, compliance, customer operations, and multi-step reasoning across regulated industries.

This role sits at the intersection of LLM post-training research and production engineering. You’ll translate customer workflows into bespoke environments, design reward signals that hold up under optimization pressure, and ship pipelines that turn human-labeled traces into measurable agent improvements.

What You’ll Do

Design simulation environments and digital twins for enterprise workflows
Post-train LLM agents using the right method for the task — RLHF, DPO, GRPO, PPO, and whatever comes next
Build pipelines that turn human-labeled traces and verifiable signals into training data
Architect multi-turn, tool-using agents with closed learning loops
Design reward functions and verifiers that resist reward hacking and reflect real task outcomes
Translate research into production; contribute to publications

Required Qualifications

3+ years fine-tuning LLMs, with hands-on experience in RL post-training
Experience building or training LLM-based agents — tool use, multi-turn reasoning, trajectory evaluation
Strong Python and software engineering skills; comfortable building pipelines, not just notebooks
Working knowledge of modern post-training and rollout-serving libraries
MS/PhD in CS, ML, or related field, or equivalent industry experience

Preferred Qualifications

Publications at NeurIPS, ICML, ICLR, ACL, COLM, or similar venues
Open-source contributions to post-training or agent frameworks (TRL, veRL, OpenRLHF, SkyRL, or similar)
Background in classical RL
Domain experience in healthcare, finance, logistics, or compliance
Experience with synthetic data generation, simulation, or world models
Distributed training experience

Why Join Centific

Lead the frontier. Shape a new discipline at the intersection of post-training, simulation, and enterprise AI
Ship your science. See your research power real systems across healthcare, finance, and safety-critical operations
Collaborate with leaders. Work alongside NVIDIA, Microsoft, and the global AI community
Build what matters. Create governed, compliant AI systems enterprises can actually trust

Learn more about us at

Centific is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, citizenship status, age, mental or physical disability, medical condition, sex (including pregnancy), gender identity or expression, sexual orientation, marital status, familial status, veteran status, or any other characteristic protected by applicable law. We consider qualified applicants regardless of criminal histories, consistent with legal requirements.

#J-18808-Ljbffr