×
Register Here to Apply for Jobs or Post Jobs. X

RL Environments Engineer; Contractor, Remote

Remote / Online - Candidates ideally in
San Francisco, San Francisco County, California, 94199, USA
Listing for: Preference Model, Inc.
Contract, Remote/Work from Home position
Listed on 2026-02-11
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
Position: RL Environments Engineer (Contractor, Remote)

RL Environments Engineer (Remote, Contractor) - Preference Model
About the company

Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models are out of distribution. Preference Model creates RL environments where models encounter research and engineering problems, iterate, and learn from realistic feedback loops.

Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind the Claude. We are partnering with leading AI labs to push AI closer to achieving its transformative potential. We are backed by Tier 1 Silicon Valley VC.

Brief Description of the Role

We’re hiring RL Environments Engineers to design and build MLE environments
. The goal is to teach LLMs better reasoning / advanced concepts from modern ML.
This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.

Minimum Qualifications:
  • Strong Python (engineering-quality, not notebook-only)
  • Docker + production mindset (debugging, reliability, iteration speed)
  • Clear understanding of LLMs, their current limitations
  • Ability to meet throughput expectations and respond quickly to feedback.
You may be a good fit if one of the following applies
  • Deep understanding of transformer internals, training/inference of modern LLMs, experience with inference libraries (vLLM, SGLang, etc)
  • Strong expertise in CUDA or Pallas kernel development, optimizing non-trivial neural modules to specific hardware
  • Expert knowledge in an active DL/ML research area, with publications or public code to show for it.
  • You have strong fundamentals and broad research interests
    , you read many papers, understand them deeply and have creativity to translate them into RLVR problems
  • You have built complex interactive RL environments and have strong insights into open-ended RL-based learning systems
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary