×
Register Here to Apply for Jobs or Post Jobs. X

Reinforcement Learning Environment Engineer

Remote / Online - Candidates ideally in
Bakersfield, Kern County, California, 93301, USA
Listing for: Open Data Science
Full Time, Remote/Work from Home position
Listed on 2026-06-06
Job specializations:
  • IT/Tech
    AI Engineer, Data Scientist
Salary/Wage Range or Industry Benchmark: 150000 - 200000 USD Yearly USD 150000.00 200000.00 YEAR
Job Description & How to Apply Below

Reinforcement Learning Environment Engineer

RL Environments; MLE; LLM Tasks;
Difficulty Distribution;
Remote Contractor; PST Overlap (≥4h);
Advanced English (C1/C2);

We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.

About the company

Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models for are outside of their training data distribution. Preference Model creates reinforcement learning environments that encapsulate real-world use cases, enabling AI systems to practice, adapt, and learn from feedback grounded in reality.

We seek to bring the real world into distribution for the models.

Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind the Claude model. We are partnering with leading AI labs to push AI closer to achieving its transformative potential.

The company is backed by Tier 1 Silicon Valley VC.

Responsibilities
  • Design and build MLE/SWE environments and diverse tasks.
  • Target a specified language model and satisfy the required difficulty distribution.
  • Deliver ~1 task per 8-10 hours once onboarded.
  • Edit tasks within 24 hours based on customer feedback.
  • Onboard quickly and start delivering on day one with minimal supervision.
Requirements What we’re looking for (must-haves)
  • Strong Python (engineering-quality, not notebook‑only).
  • Hands‑on LLM/GenAI work in production: you’ve shipped and operated real systems (not “wrapped an API and called it AI”).
  • Strong product/engineering ownership: comfortable building, fixing, and scaling end‑to‑end pipelines.
  • ≥4 hours PST overlap and advanced English (C1/C2) for specs, reviews, and feedback.
  • Ability to meet throughput expectations and respond quickly to feedback.
Strong signals (nice‑to‑have, big plus)
  • Experience in high‑stakes or regulated domains (e.g., healthcare, finance, fraud/risk, safety‑critical systems).
  • Experience designing environments/tasks for RL and/or evaluations.
  • Exposure to RL / bandits / agentic systems (not required, but a strong signal).
Not a fit if
  • You’re primarily a prompt engineer without strong ML/engineering foundations.
  • You’re a research‑only / academic‑only profile with little or no shipping/production ownership.
  • You’ve only built in notebooks or rely heavily on managed AutoML tools.
Working conditions
  • hours/week - full time - need 4 hours overlap in the working hours with the team in Pacific time zone;
  • Deliverables-driven; begin shipping on day one.
  • Conversion & relocation:
    Potential path to FTE and relocation to the Bay Area if performance and mutual fit align.
Contacts#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary