AI Evaluation TPM — Cross-Functional Impact Job London area,Greater London England UK,IT/Tech

Location: Greater London

About the role:

We are seeking a Technical Program Manager to lead our AI model evaluation initiatives across multiple work streams. This role will be crucial in assessing the performance, capabilities, limitations, and potential risks of our AI models. Working closely with our Research, Trust & Safety, Frontier Red teaming, and Policy teams, you will drive high-priority evaluation projects to build new processes, align metrics with policy, and track measurable progress.

You will help build and adapt the model evaluation program to ensure model deployments are rigorous and aligned with our commitment to responsible AI development. The ideal candidate will have a strong technical background and experience managing cross-functional programs in AI development, ML engineering, or related fields.

You’ll be joining a team of Technical Program Managers who own and drive cross-functional programs that align to the company’s top priorities. In this role, you’ll have the opportunity to make a foundational impact as you contribute the scaling of a centralized TPM function for the company.

Extremely strong soft skills are paramount, as our team is front and center in driving lots of company-wide changes and top priority initiatives that require generating buy-in, balancing various opinions, and competing for attention in our rapidly scaling environment.

This role is a great fit for someone who has both seen excellence at scale and operated in rapidly scaling, high-ambiguity teams and scope. We are seeking candidates with deep TPM expertise but who are comfortable acting as adaptable generalists who add value fast.

We excel at maintaining a broad view of our work but diving deep into the details when necessary. We understand business goals, translate and organize them into technical programs and projects, and drive execution. We are adept at engaging with both non-technical and technical stakeholders at all levels of the company, including executive leadership.

In this role, you will have the opportunity to shape the development of advanced AI systems and contribute to Anthropic's mission of ensuring that AI benefits all of humanity. If you are passionate about responsible AI development, have a strong technical background, and thrive in a fast-paced, collaborative environment, we'd love to hear from you.

Responsibilities:

Partner with teams like Frontier Risk Evaluations, Security, and Trust & Safety to develop and implement comprehensive evaluation protocols for our latest frontier AI models
Build a single source of truth for tracking all types of model evaluations as required by our Responsible Scaling Policy, AI safety institutes, the White House, and others
Develop and maintain procedures for conducting evaluations, including designing test suites, coordinating red team exercises, and analyzing results
Create and manage dashboards and reporting systems to track model performance, safety metrics, and evaluation outcomes across different AI systems and versions
Lead cross-functional workshops to identify potential risks and edge cases for evaluation, ensuring thorough coverage of AI capabilities and limitations
Coordinate with external partners and industry standards bodies to align our evaluation practices with emerging best practices in responsible AI development
Provide detailed status reports, identifying technical risks, dependencies, and areas requiring additional support
Facilitate communication and coordination between technical work streams and stakeholders
Continuously identify opportunities for technical process improvements and implement changes as needed
Stay up-to-date with the latest developments in AI safety, ML engineering, and related fields to ensure the program remains at the forefront of responsible AI development

You might be a good fit if you:

Have several years of experience in technical program management, with a track record of successfully delivering complex technical programs, preferably in AI development, ML engineering, or related fields
Have experience executing technical programs that require systems and engineering-level knowledge.
Have exceptionally strong interpersonal and communication…