Head of Research Job San Francisco area,California USA,Research/Development

Measuring intelligence is hard, and humans haven't been particularly good proxies we've used — IQ, standardized tests, credentials — have shaped how we develop intelligence and how we value it, often in ways we later regret. AI gives us a chance to do better. The field is young enough that the methodologies for measuring what these systems can actually do are still being written, and the answers we settle on will shape what gets built, what gets deployed, and which workflows get automated next.

Vals is building the measurement layer for the AI economy: the benchmarks, methodologies, and standards that determine which models ship and where they get trusted. We're hiring a Head of Research to lead it.

Responsibilities

Concretely, you'll:

Advance the science of evaluation. The methodologies the field uses today — judge models, human-in-the-loop, static benchmarks — were built for a previous generation of models and break down on long‑horizon, real‑world tasks. You'll develop the new paradigms.
Oversee Vals' broader research portfolio, setting direction across the projects already underway and the ones we haven't started yet.
Publish work that moves the field forward. We want Vals' research to be cited, not just shipped.
Recruit and grow a research team alongside the founders.
Work directly with our enterprise customers and lab partners on the evaluation problems they actually have.

Requirements

A PhD in ML/NLP (in progress or completed), or equivalent industry research track record.
Deep familiarity with the LLM evaluation landscape: existing benchmarks, their failure modes, judge‑model approaches, human‑in‑the‑loop methodologies.
A bias toward research that affects what people actually deploy, rather than benchmarks that are easy to game.
Strong written and verbal communication. You'll publish, present, and talk to customers and labs.
Ability to work in‑person, in San Francisco.

Nice to Haves

A widely‑cited benchmark or eval framework you've built or co‑built.
Prior experience at a frontier lab (Anthropic, OpenAI, Google Deep Mind, Meta FAIR) or a research‑led startup.
Domain depth in one or more of our verticals (legal, finance, insurance, healthcare).
Experience leading or mentoring other researchers.
A public research presence: papers, blog posts, talks, or open‑source contributions people in the field recognize.

What We Offer

Highly competitive salary and equity. Excellence is well rewarded.
Relocation and transportation support.
Health/dental insurance coverage.
Lunch and dinner provided, free snacks/coffee/drinks.
401(k) plan.
Unlimited PTO.

#J-18808-Ljbffr