×
Register Here to Apply for Jobs or Post Jobs. X

AI Research engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Cerebro
Full Time position
Listed on 2026-06-18
Job specializations:
  • Research/Development
    AI Evaluation, Data Annotation/ AI Labeling, Data Scientist
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

AI Researcher – Data, Evaluation & Alignment

We're building critical infrastructure used by leading AI research organizations to improve the capabilities, reliability, and alignment of frontier language models.

Our team works directly on the datasets, evaluation systems, and feedback loops that determine how state-of-the-art models are trained, measured, and improved. As an early team member, you'll have unusual ownership, direct exposure to cutting‑edge research, and the opportunity to influence how next‑generation AI systems are developed.

The Opportunity

Most AI research focuses on models. We focus on the signals that shape them.

This role sits at the intersection of AI research, evaluation science, reinforcement learning, and data generation. You'll work on identifying model weaknesses, designing experiments to expose them, and creating the datasets and evaluation frameworks that help researchers push model performance forward.

Your work will directly impact large‑scale training runs and research decisions at some of the most advanced AI organizations in the world.

What You'll Work On
  • Design novel datasets that reveal meaningful model failure modes across domains such as reasoning, coding, finance, and enterprise workflows.
  • Develop evaluation frameworks that go beyond static benchmarks and provide actionable insights into model capabilities.
  • Create and refine reward signals, rubrics, and measurement systems for RLHF, RLAIF, and related post‑training methods.
  • Run experiments to understand how data quality, structure, and selection influence model behaviour.
  • Build quantitative approaches for measuring dataset quality, diversity, robustness, and downstream impact.
  • Investigate annotator behaviour and human‑feedback dynamics to improve training signal quality.
  • Partner closely with frontier AI research teams to translate high‑level research goals into concrete datasets and evaluation protocols.
Who We're Looking For

We're particularly excited about candidates who are early in their research careers but already demonstrate exceptional research ability and curiosity.

  • Undergraduate researchers with outstanding AI/ML research experience.
  • Master's students or graduates pursuing advanced work in machine learning, reinforcement learning, NLP, evaluation, or alignment.
  • Researchers from benchmarking, AI safety, or model evaluation organizations.
  • Individuals who have interned or worked in reinforcement learning, alignment, evaluation, or frontier‑model environments.
Strong Signals
  • Deep interest in understanding how data drives model behaviour.
  • Familiarity with LLM training, RLHF, reward modelling, evaluation methodologies, or alignment research.
  • Experience designing experiments and extracting insight from noisy or ambiguous results.
  • Strong quantitative reasoning and scientific thinking.
  • Comfort operating across multiple domains and problem spaces.
  • A builder mentality with a bias toward rapid experimentation and iteration.
Particularly Relevant Backgrounds

Experience with areas such as:

  • LLM evaluation and benchmarking
  • AI safety and alignment
  • Model auditing and failure analysis
Why This Role
  • Work on problems that directly influence how state‑of‑the‑art AI systems are trained and evaluated.
  • Collaborate with some of the world's leading AI research teams.
  • Operate with significant autonomy and ownership.
  • Join at an early stage where individual contributions have outsized impact.
  • Competitive compensation and meaningful equity participation.

If you're fascinated by why models succeed, fail, and improve—and you believe better data and evaluation are among the highest‑leverage problems in AI—this role offers a unique opportunity to shape the future of frontier model development.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary