AI Research engineer Job San Francisco area,California USA,Research/Development

AI Researcher – Data, Evaluation & Alignment

We're building critical infrastructure used by leading AI research organizations to improve the capabilities, reliability, and alignment of frontier language models.

Our team works directly on the datasets, evaluation systems, and feedback loops that determine how state-of-the-art models are trained, measured, and improved. As an early team member, you'll have unusual ownership, direct exposure to cutting‑edge research, and the opportunity to influence how next‑generation AI systems are developed.

The Opportunity

Most AI research focuses on models. We focus on the signals that shape them.

This role sits at the intersection of AI research, evaluation science, reinforcement learning, and data generation. You'll work on identifying model weaknesses, designing experiments to expose them, and creating the datasets and evaluation frameworks that help researchers push model performance forward.

Your work will directly impact large‑scale training runs and research decisions at some of the most advanced AI organizations in the world.

What You'll Work On

Design novel datasets that reveal meaningful model failure modes across domains such as reasoning, coding, finance, and enterprise workflows.
Develop evaluation frameworks that go beyond static benchmarks and provide actionable insights into model capabilities.
Create and refine reward signals, rubrics, and measurement systems for RLHF, RLAIF, and related post‑training methods.
Run experiments to understand how data quality, structure, and selection influence model behaviour.
Build quantitative approaches for measuring dataset quality, diversity, robustness, and downstream impact.
Investigate annotator behaviour and human‑feedback dynamics to improve training signal quality.
Partner closely with frontier AI research teams to translate high‑level research goals into concrete datasets and evaluation protocols.

Who We're Looking For

We're particularly excited about candidates who are early in their research careers but already demonstrate exceptional research ability and curiosity.

Undergraduate researchers with outstanding AI/ML research experience.
Master's students or graduates pursuing advanced work in machine learning, reinforcement learning, NLP, evaluation, or alignment.
Researchers from benchmarking, AI safety, or model evaluation organizations.
Individuals who have interned or worked in reinforcement learning, alignment, evaluation, or frontier‑model environments.

Strong Signals

Deep interest in understanding how data drives model behaviour.
Familiarity with LLM training, RLHF, reward modelling, evaluation methodologies, or alignment research.
Experience designing experiments and extracting insight from noisy or ambiguous results.
Strong quantitative reasoning and scientific thinking.
Comfort operating across multiple domains and problem spaces.
A builder mentality with a bias toward rapid experimentation and iteration.

Particularly Relevant Backgrounds

Experience with areas such as:

LLM evaluation and benchmarking
AI safety and alignment
Model auditing and failure analysis

Why This Role

Work on problems that directly influence how state‑of‑the‑art AI systems are trained and evaluated.
Collaborate with some of the world's leading AI research teams.
Operate with significant autonomy and ownership.
Join at an early stage where individual contributions have outsized impact.
Competitive compensation and meaningful equity participation.

If you're fascinated by why models succeed, fail, and improve—and you believe better data and evaluation are among the highest‑leverage problems in AI—this role offers a unique opportunity to shape the future of frontier model development.

#J-18808-Ljbffr