Senior Machine Learning RE/RS Job New York City area,New York USA,Software Development

Position: (Senior) Machine Learning RE/RS

Referral Bonus

We are offering a $21k referral bonus for this role. You can refer people through our form, and it lists the terms of this bonus.

About METR

We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. Our work advances the science of AI measurement by understanding frontier AI systems' ability to complete complex tasks without human input, and directly executing those measurements to inform risk assessments and consensus within the AI industry, among policymakers, and the public.

Our work has been cited by NIST, a previous US President, the UK Government, Nature, The New York Times, and Time Magazine. Our work with leading AI labs, governments, and academia ensures that our insights can quickly be leveraged to promote the safe development of increasingly powerful AI systems. We believe it is robustly good for civilization to have a clear understanding of what types of danger AI systems pose and how high the risk is, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.

What

We're Looking For

We’re looking for a combination of skills across “research science”, “research execution” and software engineering. You may not have all of these skills (for example, we don’t expect software engineering to be a large part of the role for narrowly focused researchers).

Research Science

You have strong knowledge of relevant literature and general research good practice.
You have good understanding of how particular projects fit into METR's overall mission - you are thinking about things like "how will this generalize to future models", or "how does this relate to alignment evals".
You reliably notice important but subtle methodological limitations.
You are undaunted by open-ended mandates - you can take a confusing or ill-posed question and produce insightful and helpful frameworks / proposals / results.
You can write great papers.

Research Execution

You are an experienced executor/contributor; you are familiar with patterns of successful and unsuccessful execution in frontier ML research. You are undaunted by "I've never done this before" or even "no-one has done this before".
Your total output is "team-sized" - you manage multiple people or run a project or are several times more productive as an IC than core staff.
You are creative, ambitious and entrepreneurial. You work fast and are highly responsive and available. You can juggle many balls when it is useful.

Software Engineering

You balance rapid prototyping with the creation of maintainable, scalable systems and make sound technical decisions.
You lead large projects from ideation to delivery, balancing innovative ML solutions with reliable, high-quality code.
You set high standards for system architecture, code quality, and maintainability, influencing broad software practices across the organization.

$250,000 - $450,000 a year

Foundational Evaluations Research

Identify the biggest limitations to current understanding of frontier model capabilities and propensities
Generate and rapidly derisk new methodologies and frameworks that can move the field forward
Ensure these are externally valid, connecting with our threat models and helping us better predict risk
Publish these as useful artifacts (datasets, environments, papers, model organisms) that the field can build on
Streamline methodologies for use in evaluation sprints or live dashboards

Evaluation Sprints and Iteration

As new models are developed, partner with labs to provide external oversight so that we have the ability to "sound the alarm" if risk levels are unacceptably high
Develop new techniques on the fly to deal with unexpected capabilities, behaviors or features of models
Spot subtle methodological flaws or missing evidence, ensuring our evaluations are trustworthy and rigorous
Draw conclusions about overall levels of risk, and communicate these clearly
Anticipate the methodologies and artifacts we’ll need to assess risk from future generations of models, and…


Increase/decrease your Search Radius (miles)



Job Posting Language

Senior Machine Learning RE​/RS

Senior Machine Learning RE/RS