Research Engineer — Post-Training & Language Models; SLMs),Healthcare AI Job Scottsdale area,Arizona USA,Software Development

Position: Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI

Three hundred fifty million Americans rely on a healthcare system whose decision-making has become slow, costly, and adversarial - care delayed by prior authorization and paperwork, claims that misfire, clinical decisions made without the right information at the right moment, and patients who struggle to navigate or afford the care they need. Deloitte has a new AI‑first effort, backed by $1B in committed investment, building the reasoning models and agentic systems to rebuild how that system decides - across payers, providers, and life sciences, and for the patients they serve - so that care is faster, fairer, and far less wasteful.

This is not AI applied at the margins. It is a ground‑up rebuild of the decision‑making machinery behind American healthcare, at national scale.

This is resourced to do real post‑training at scale - committed investment in GPU compute and training infrastructure, not toy fine‑tunes.

As a Research Engineer on our post‑training team, you will design, train, evaluate, and align the models that reason about healthcare - working across the full post‑training lifecycle to shape model behavior for clinical and operational decisioning across the industry. Healthcare decisioning is one of the cleanest verifiable‑reward domains outside math and code: the problems are hard. We ground that reward in real signals - clinical policy and criteria, adjudicated outcomes, and clinical‑expert judgment - so correctness is checkable rather than asserted.

You will own the post‑training stack for our clinical reasoning models end to end - from data and reward design through trained, evaluated models that ship. This is not a prompt‑engineering role. We are looking for people who understand not just how to use LLMs, but how to improve and shape model behavior through advanced post‑training.

You do not need a healthcare background. We pair every engineer with clinical and domain experts and teach you the domain - you bring the modeling depth.

We hire on demonstrated depth, not years - the level you join at is determined through our interview process, based on the depth and judgment you demonstrate, not your years in a title.

Work you’ll do Post‑training & alignment

Design and execute post‑training pipelines: supervised fine‑tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.
Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
Train reasoning models for healthcare decisioning using verifiable‑reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.

Reward modeling & data

Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.
Curate, clean, synthesize, and evaluate large‑scale instruction, preference, and domain‑specific datasets, with rigorous filtering, deduplication, and quality control.
Build verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical‑expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scale.

Efficient fine‑tuning, training & inference infrastructure

Implement efficient fine‑tuning strategies including LoRA, QLoRA, PEFT, and adapter‑based approaches; build scalable distributed training using Deep Speed, FSDP, Megatron‑LM, Ray, or equivalent.
Optimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT‑LLM, or TGI.

Small language models & open‑weight models

Train and optimize open‑weight models such as Llama, Qwen, Mistral, or Deep Seek; build specialized small language models (SLMs) for on‑premise and cloud‑hybrid deployment with strong performance‑per‑dollar.

Evaluation, safety & red teaming

Design evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain‑specific metrics.
Build…

Research Engineer — Post-Training & Language Models; SLMs), Healthcare AI