×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

Researcher, Post-Training

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: MakerMaker.AI
Apprenticeship/Internship position
Listed on 2026-06-07
Job specializations:
  • Research/Development
    Data Scientist
Salary/Wage Range or Industry Benchmark: 120000 - 160000 USD Yearly USD 120000.00 160000.00 YEAR
Job Description & How to Apply Below
Position: RESEARCHER, POST-TRAINING

ABOUT THE COMPANY

We're building autonomous research agents for recursive self-improvement (multi-agent systems that propose, run, and analyze machine learning experiments). We're a small team based in San Francisco, on-site.

ABOUT

THE ROLE

You'll lead our work on model post-training: supervised fine-tuning, preference data, reinforcement learning from human and AI feedback, reward modeling, and the evaluation suites that tell us what's actually working. You'll own a research area that meaningfully shapes our model behavior and capability.

This is a hands‑on senior research role. You'll set direction, run experiments, and ship into production. You'll partner with the data, infrastructure, and engineering teams to make the post‑training pipeline reliable and fast: improvements there compound into every model we ship.

WHAT YOU'LL DO
  • Lead post‑training research: SFT, RLHF/RLAIF, RLVR, DPO and successor methods, reward modeling, preference data design
  • Design and curate the data that goes into post‑training (from sourcing, to filtering, to quality assessment)
  • Build and maintain the evaluation suites that measure what matters; resist Goodharting your own benchmarks
  • Run rigorous experiments (controls, ablations, statistical significance) and write up internal findings clearly
  • Scale data pipelines and the infrastructure team to scale training
  • Identify and characterize failure modes (reward hacking, distribution drift, eval saturation) and design experiments to address them
  • Stay current on the post‑training literature; bring useful methods in, ignore the noise
WHAT WE'RE LOOKING FOR
  • Strong track record of post‑training research (SFT, RL, reward modeling) at a frontier‑model lab or equivalent
  • 5+ years of hands‑on ML research experience
  • Comfort with large‑scale data curation and preference‑data pipelines
  • Experience designing evaluation suites for capabilities that aren't easily benchmarked
  • Fluent in PyTorch or equivalent; comfortable at the scale of distributed training
  • Strong statistical instincts: you'd notice a flawed comparison before someone else points it out
  • Strong written communication
NICE TO HAVE
  • PhD in ML, statistics, CS, or adjacent
  • Published research at NeurIPS, ICML, ICLR, COLM, RLC, or comparable venues
  • Experience with reward hacking detection, scaling reward models, or RLHF infrastructure
  • Synthetic data generation experience
  • Background in RL math (policy gradients, importance sampling, off‑policy methods)
  • Open‑source contributions to post‑training infrastructure
THIS ROLE IS PROBABLY NOT FOR YOU IF
  • You are primarily interested in pretraining (that's a different role)
  • You would rather invent novel methods in isolation than ship them into a model that real users run
  • You prefer benchmarks that are stable to evaluation work where the right answer isn't yet defined
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary