×
Register Here to Apply for Jobs or Post Jobs. X

Post-Training — Engineer​/Algorithm Researcher

Job in Menlo Park, San Mateo County, California, 94029, USA
Listing for: Stealth Startup
Apprenticeship/Internship position
Listed on 2026-06-24
Job specializations:
  • IT/Tech
    Machine Learning/ ML Engineer, AI Evaluation, Data Scientist
Salary/Wage Range or Industry Benchmark: 150000 - 200000 USD Yearly USD 150000.00 200000.00 YEAR
Job Description & How to Apply Below
Position: Post-Training — Engineer / Algorithm Researcher [33248]
  • Post-training algorithm R&D. Own the full post-training pipeline for the coding agent—supervised fine-tuning (SFT), reward modeling, and reinforcement learning (RLHF/DPO/GRPO/PPO, etc.)—continuously improving code generation, debugging, and multi-step reasoning on real software-engineering tasks.
  • Verifiable rewards & agentic RL. Design reward mechanisms based on verifiable signals (unit tests, compile/execution results, static checks, etc.) for coding scenarios (RLVR); build a multi-turn agentic RL training paradigm with tool-call and execution-feedback loops, improving success rate and stability on long-horizon tasks.
  • Evaluation-model training. Develop evaluation/judge models for coding tasks (LLM-as-a-Judge, generative reward models, critic/verifier models, etc.); use post-training to give them highly consistent judgment of code correctness, executability, and quality; continuously improve alignment with human annotation and verifiable signals to reduce evaluation bias and noise.
  • Data & reward-signal engineering. Lead the construction and governance of post-training data—preference-data collection, synthetic-data generation, difficulty grading, and quality filtering; identify and mitigate reward hacking and distribution drift to keep training and evaluation signals reliable.
  • Training–evaluation loop. Partner with the evaluation team to build an end-to-end evaluation system for coding agents (SWE-bench-style benchmarks, in-house task sets); feed results back into post-training iteration to create a fast experiment–verify–converge cadence.
  • Training at scale. Work closely with the infra team to land RL training efficiently on large clusters; optimize the coordination of rollout sampling, inference engines (vLLM/SGLang), and the training framework to raise overall throughput and sample efficiency.
Qualifications
  • Education. Bachelor's degree or above in CS, AI, Mathematics, Statistics, or a related field;
    Master's/PhD preferred.
  • Post-training experience. Deep understanding of the LLM post-training stack; complete hands-on experience in at least one of SFT, RLHF, DPO/GRPO/PPO, or reward modeling; able to independently run the full experiment loop from data to training to evaluation.
  • Evaluation-model experience. Understanding of reward-model / judge-model training and evaluation; familiarity with LLM-as-a-Judge, pairwise/pointwise scoring, and verifier paradigms; experience with evaluation consistency, calibration, and bias analysis a plus.
  • RL foundations. Solid grasp of RL fundamentals (policy gradients, value functions, advantage estimation, etc.); experience with stability, sample efficiency, and hyperparameter tuning of RL training in the LLM setting.
  • Engineering ability. Proficient in Python with a solid foundation in data structures and algorithms; skilled with PyTorch and real usage or secondary development experience with mainstream post-training/RL frameworks (TRL, veRL, OpenRLHF, Deep Speed-Chat, etc.).
  • Coding-domain understanding. Understanding of code generation and software-engineering task characteristics; able to build effective training and evaluation signals around verifiable rewards, sandboxed execution, and test-case design.
  • Research & debugging. Able to read and reproduce frontier papers; strong analysis and diagnosis of training-curve anomalies, reward collapse, model degradation, and evaluation drift.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary