Member of Technical Staff Job Berkeley Springs West Virginia USA,Research/Development

Location: Berkeley Springs

NOTE: If you previously applied to one of our Research Engineer/Scientist, Machine Learning Research Engineer/Scientist, or Research Stream Lead roles, you do not need to apply again. We are merging all inbound applications for researcher roles into this one.

About METR

We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment.

We believe it is robustly good for civilization to have a clearer understanding of what dangers AI systems pose, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.

What We're Looking For

METR currently has 4 primary research streams:

Capabilities: Accurately measuring frontier model performance on threat-relevant tasks (autonomy, AI R&D automation, etc.) and predicting future capabilities. We develop and maintain benchmarks, diverse evidence-gathering methods, and metrics to track capability trends and anticipate the thresholds that matter most for safety.
Monitorability: Understanding how well frontier models can take subversive or unwanted actions despite various monitoring or control protocols. We build the research infrastructure - novel metrics, control evaluations, elicitation methods - needed to improve the world's understanding of how effectively current and future models can circumvent oversight.
Alignment/Propensity: Determining whether or not a model that is capable of causing catastrophic harm (in its actual deployment setting) would be likely to actually do so in a given high-stakes deployment setting. We aim to develop the science of propensity evaluations and examine when we might expect high-stakes catastrophic misalignment.
Evaluation Execution: Product ionizing, improving, and executing our various evaluations. We streamline our processes and build common infrastructure to scale our ability to continually run our most up-to-date evaluations on the latest models. Focused more on engineering than research.

The Capabilities stream is looking specifically for a senior research engineer. The Monitorability stream is hiring general research ICs (individual contributors). The Alignment/Propensity and Evaluation Execution streams are hiring for a Stream Lead and ICs. The stream you end up joining will be based on a combination of working fit and interest.

For our research IC roles, we are looking for a combination of skills across "research science", "research execution" and software engineering. You may not have all of these skills (for example, we don't expect software engineering to be a large part of the role for narrowly focused researchers). For the Stream Lead role, we are additionally looking for research management skills (applies more for Alignment/Propensity stream than Evaluation Execution stream).

Research Science

You have strong knowledge of relevant literature and general research good practice.
You have good understanding of how particular projects fit into METR's overall mission - you are thinking about things like "how will this generalize to future models", or "how does this relate to alignment evals".
You reliably notice important but subtle methodological limitations.
You are undaunted by open-ended mandates - you can take a confusing or ill-posed question and produce insightful and helpful frameworks / proposals / results.
You can write great papers.

Research Execution

You are an experienced executor/contributor; you are familiar with patterns of successful and unsuccessful execution in frontier ML research. You are undaunted by "I've never done this before" or even "no-one has done this before".
You are creative, ambitious and entrepreneurial. You work fast and are highly responsive and available. You can juggle many balls when it is useful.

Software Engineering

You balance rapid prototyping with the creation of maintainable, scalable systems and make sound technical decisions.
You lead large projects from ideation to delivery, balancing innovative ML solutions with reliable, high-quality code.
You set high standards for system architecture, code quality, and maintainability, influencing broad software practices across the organization.

Research Management

You can set a compelling and coherent research agenda for a team - you translate high-level goals into tractable projects that serve METR's mission.
You have experience hiring and developing researchers; you have good judgment about what makes someone excellent, and you actively invest in the growth of your team members.
You can hold a team to high standards while being comfortable with fast-moving/scrappy research workflows - you know when to push for rigor and when to ship, and you help others develop this judgment too.
You are an effective technical manager - you can meaningfully evaluate the work of researchers across a range of subproblems, give useful feedback, and catch things…