Member of Technical Staff — Data Quality Operations Job San Francisco area,California USA,IT/Tech

Member of Technical Staff — Data Quality Operations

About Patronus AI

Patronus AI is a frontier lab developing simulation research and infrastructure to accelerate progress toward human-aligned AGI. We are on a mission to simulate all of the world’s intelligence.

We are the team behind some of the earliest and most influential research in AI evaluation like Finance Bench, Lynx, Simple Safety Tests , Copyright Catcher, Humanity’s Last Exam, and more. We are formerly AI researchers and engineers from companies like Meta AI, Amazon AGI, and Google. Our customers include foundation model labs and Fortune 500 enterprises like Adobe. We are backed by top-tier investors like Lightspeed Venture Partners, Notable Capital, Stanford University, Noam Brown, Gokul Rajaram, and more.

Responsibilities

We are looking for a Member of Technical Staff — Data Quality Operations to bridge the gap between model evaluation, data generation, and engineering execution. At Patronus, “data quality” isn’t just about catching issues downstream but it’s about building a measurable, repeatable system that ensures our frontier evaluation datasets and tasks are correct, diverse, and customer‑ready before they ever reach QA.

You will have end‑to‑end ownership of the pre‑QA quality layer, establishing technical standards across diverse environments and conducting deep‑dive analyses to preemptively identify systemic issues. By converting these insights into high‑impact improvements for task generation pipelines, evaluation rubrics, and internal tooling, you will ensure our data remains the industry gold standard.

Working in lockstep with our Head of Operations, you will operationalize these quality benchmarks across internal teams and customer engagements to guarantee predictable, high‑fidelity delivery. Furthermore, you will collaborate with the Platform team to design the instrumentation and automation necessary to transform these manual quality gates into a frictionless, scalable infrastructure.

In this role, you will:

Define cross‑environment data quality standards and implement pre‑QA analyses and gates that catch issues early (e.g., duplication, diversity, tool coverage, difficulty calibration, rubric compliance). Maintain a consistent baseline across environments with configurable checks based on customer feedback.
Analyze SOTA runs, execution traces, and dataset artifacts to build a clear taxonomy of failure modes and quality gaps. Translate patterns into actionable data requirements (new tasks, edge cases, hard negatives, and distribution fixes).
Partner with Head of Operations to turn quality standards into an operating cadence—ownership, SLAs, escalation paths, vendor feedback loops, and release gates—so quality is enforced consistently across environments and teams.
Convert quality findings into engineering‑ready tickets and partner with Environment, Frontend, Tooling, and Platform teams to drive fixes to verified closure—improving generators, validators, dashboards, and automated checks.
Maintain ship gates and release notes for datasets/tasks. Own quality metrics, resolved issues, and versioned snapshots to ensure every release aligns with customer acceptance criteria.
Track quality signals like defect rates (blocker/major), rework, cycle time, and throughput. Slice by domain, task type, tool, and vendor to surface trends early and prevent regressions.
Drive fixes upstream by improving rubrics, task generation methods, and tooling. You won’t just detect the same issue twice—you’ll build systems that prevent it from recurring.

Qualifications

Above all, we look for an eagerness to learn, passion for research, creativity in problem solving and a proactive mindset. You are a great fit if you have a background in the following:
3+ years of experience in Data Ops, QA Ops, Program Ops, or Technical Ops within a production, tooling‑heavy environment.
Proven ownership of complex workflows across QA and engineering (from triage and assignment to final verification).
Experience performing evaluation or model error analysis and converting those insights into actionable data specifications.
Strong ability to write clear acceptance criteria and the backbone to enforce ship/no‑ship gates.
High integrity, proactive mindset, and a passion for building reliable AI.

Nice to Haves

Experience with RLHF, tool‑use agents
, or simulated/agentic environments.
Background in vendor quality management and calibration/audit systems.
Competitive salary and equity packages
Health, dental, and vision insurance plans

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language