×
Register Here to Apply for Jobs or Post Jobs. X

Software Engineer - RL Environments

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: AfterQuery
Full Time position
Listed on 2026-06-01
Job specializations:
  • Software Development
    Data Scientist, Data Engineer, Machine Learning/ ML Engineer, AI Engineer
Salary/Wage Range or Industry Benchmark: 200000 USD Yearly USD 200000.00 YEAR
Job Description & How to Apply Below
About After Query

After Query builds the training data and evaluation infrastructure that frontier AI labs use to make their models better. We work with the world's leading labs to design high signal datasets and run rigorous evaluations that go beyond static benchmarks. We are a small, early team (post Series A) where individual contributors have a direct impact on how the next generation of models learn and improve.

The Role

As a SWE (Environments), you will design the datasets and evaluation rubrics that directly influence how frontier models learn. You'll work hands-on with research teams at top AI labs, experimenting with data collection strategies, diagnosing model failure modes, and developing the metrics that determine whether a model is actually improving. You'll go from hypothesis to live experiment quickly, and your output will feed directly into model training runs at scale.

Day to day, you will design data slices that expose meaningful failure modes across domains like finance, code, and enterprise workflows. You will build and refine reward signals for RLHF and RLVR pipelines. You will develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on alignment and capability. You will partner with lab research teams to translate their training objectives into concrete data and evaluation specifications.

What You'll Do
  • Design data slides and explore data shapes that expose meaningful model failure modes across domains like finance, code, and enterprise workflows
  • Build and refine evaluation rubrics and reward signals for RLHF and RLVR training pipelines
  • Model annotator behavior and run experiments to improve different model capabilities
  • Develop quantitative frameworks for measuring dataset quality, diversity, and downstream impact on model alignment and capability
  • Create and manage both real world & synthetic data pipelines
  • Partner with lab research teams to translate their training objectives into concrete data and evaluation specifications
What We're Looking For
  • 1-4 YOE
  • Major plus if they've worked for/interned for any RL environment companies in the past or any AI safety or benchmarking orgs like METR, Artificial Analysis, etc..
  • Genuine obsession with how data structure, selection, and quality drive model behavior
  • Ability to design lightweight experiments, move fast, and extract actionable insights from messy results
  • Former founders and early engineers at early stage startups are a plus. We don't filter on pedigree. We want people who can demonstrate they work hard, learn fast, and care deeply about getting the details right.
Compensation Structure:

$200k base + profit share (around 150% of base) + competitive equity
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary