×
Register Here to Apply for Jobs or Post Jobs. X

Research Engineer — Post-Training & Language Models; SLMs), Healthcare AI

Job in Scottsdale, Maricopa County, Arizona, 85261, USA
Listing for: PowerToFly
Apprenticeship/Internship position
Listed on 2026-06-20
Job specializations:
  • Software Development
    Machine Learning/ ML Engineer, AI Engineer (Applied/Software)
Salary/Wage Range or Industry Benchmark: 150000 - 200000 USD Yearly USD 150000.00 200000.00 YEAR
Job Description & How to Apply Below
Position: Research Engineer — Post-Training & Small Language Models (SLMs), Healthcare AI

Three hundred fifty million Americans rely on a healthcare system whose decision-making has become slow, costly, and adversarial - care delayed by prior authorization and paperwork, claims that misfire, clinical decisions made without the right information at the right moment, and patients who struggle to navigate or afford the care they need. Deloitte has a new AI‑first effort, backed by $1B in committed investment, building the reasoning models and agentic systems to rebuild how that system decides - across payers, providers, and life sciences, and for the patients they serve - so that care is faster, fairer, and far less wasteful.

This is not AI applied at the margins. It is a ground‑up rebuild of the decision‑making machinery behind American healthcare, at national scale.

This is resourced to do real post‑training at scale - committed investment in GPU compute and training infrastructure, not toy fine‑tunes.

As a Research Engineer on our post‑training team, you will design, train, evaluate, and align the models that reason about healthcare - working across the full post‑training lifecycle to shape model behavior for clinical and operational decisioning across the industry. Healthcare decisioning is one of the cleanest verifiable‑reward domains outside math and code: the problems are hard. We ground that reward in real signals - clinical policy and criteria, adjudicated outcomes, and clinical‑expert judgment - so correctness is checkable rather than asserted.

You will own the post‑training stack for our clinical reasoning models end to end - from data and reward design through trained, evaluated models that ship. This is not a prompt‑engineering role. We are looking for people who understand not just how to use LLMs, but how to improve and shape model behavior through advanced post‑training.

You do not need a healthcare background. We pair every engineer with clinical and domain experts and teach you the domain - you bring the modeling depth.

We hire on demonstrated depth, not years - the level you join at is determined through our interview process, based on the depth and judgment you demonstrate, not your years in a title.

Work you’ll do Post‑training & alignment
  • Design and execute post‑training pipelines: supervised fine‑tuning (SFT), preference optimization, and reinforcement learning / alignment workflows.
  • Build and optimize training using techniques such as SFT, RLHF, PPO, DPO, GRPO, RLAIF, and Constitutional AI, and understand how each affects reasoning quality, safety, latency, cost, and reliability.
  • Train reasoning models for healthcare decisioning using verifiable‑reward RL - designing reward signals and verifiers grounded in clinical guidelines, policy and criteria, and adjudicated outcomes.
Reward modeling & data
  • Develop reward models and preference datasets to improve reasoning quality, factuality, safety, policy adherence, and task performance.
  • Curate, clean, synthesize, and evaluate large‑scale instruction, preference, and domain‑specific datasets, with rigorous filtering, deduplication, and quality control.
  • Build verification and reward pipelines from our proprietary clinical, claims, and operational data and from clinical‑expert labeling - turning guidelines, policy, and adjudicated outcomes into checkable reward signals at scale.
Efficient fine‑tuning, training & inference infrastructure
  • Implement efficient fine‑tuning strategies including LoRA, QLoRA, PEFT, and adapter‑based approaches; build scalable distributed training using Deep Speed, FSDP, Megatron‑LM, Ray, or equivalent.
  • Optimize inference performance - latency, throughput, quantization, and deployment efficiency - for production, including frameworks such as vLLM, TensorRT‑LLM, or TGI.
Small language models & open‑weight models
  • Train and optimize open‑weight models such as Llama, Qwen, Mistral, or Deep Seek; build specialized small language models (SLMs) for on‑premise and cloud‑hybrid deployment with strong performance‑per‑dollar.
Evaluation, safety & red teaming
  • Design evaluation frameworks covering reasoning, hallucination detection, factuality, instruction following, structured outputs, and domain‑specific metrics.
  • Build…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary