×
Register Here to Apply for Jobs or Post Jobs. X

Senior Machine Learning RE​/RS

Job in New York City, Richmond County, New York, 10261, USA
Listing for: Metr
Full Time position
Listed on 2025-11-30
Job specializations:
  • Software Development
    AI Engineer, Data Scientist
Salary/Wage Range or Industry Benchmark: 250000 - 450000 USD Yearly USD 250000.00 450000.00 YEAR
Job Description & How to Apply Below
Position: (Senior) Machine Learning RE/RS

Referral Bonus

We are offering a $21k referral bonus for this role. You can refer people through our form, and it lists the terms of this bonus.

About METR

We are a nonprofit research organization that develops scientific methods to assess AI capabilities, risks and mitigations, with a specific focus on threats related to autonomy, AI R&D automation, and alignment. Our work advances the science of AI measurement by understanding frontier AI systems' ability to complete complex tasks without human input, and directly executing those measurements to inform risk assessments and consensus within the AI industry, among policymakers, and the public.

Our work has been cited by NIST, a previous US President, the UK Government, Nature, The New York Times, and Time Magazine. Our work with leading AI labs, governments, and academia ensures that our insights can quickly be leveraged to promote the safe development of increasingly powerful AI systems. We believe it is robustly good for civilization to have a clear understanding of what types of danger AI systems pose and how high the risk is, and we are extremely excited to find ambitious, excellent people to join our team and tackle one of the most important challenges of our time.

What

We're Looking For

We’re looking for a combination of skills across “research science”, “research execution” and software engineering. You may not have all of these skills (for example, we don’t expect software engineering to be a large part of the role for narrowly focused researchers).

Research Science
  • You have strong knowledge of relevant literature and general research good practice.
  • You have good understanding of how particular projects fit into METR's overall mission - you are thinking about things like "how will this generalize to future models", or "how does this relate to alignment evals".
  • You reliably notice important but subtle methodological limitations.
  • You are undaunted by open-ended mandates - you can take a confusing or ill-posed question and produce insightful and helpful frameworks / proposals / results.
  • You can write great papers.
Research Execution
  • You are an experienced executor/contributor; you are familiar with patterns of successful and unsuccessful execution in frontier ML research. You are undaunted by "I've never done this before" or even "no-one has done this before".
  • Your total output is "team-sized" - you manage multiple people or run a project or are several times more productive as an IC than core staff.
  • You are creative, ambitious and entrepreneurial. You work fast and are highly responsive and available. You can juggle many balls when it is useful.
Software Engineering
  • You balance rapid prototyping with the creation of maintainable, scalable systems and make sound technical decisions.
  • You lead large projects from ideation to delivery, balancing innovative ML solutions with reliable, high-quality code.
  • You set high standards for system architecture, code quality, and maintainability, influencing broad software practices across the organization.

$250,000 - $450,000 a year

Foundational Evaluations Research
  • Identify the biggest limitations to current understanding of frontier model capabilities and propensities
  • Generate and rapidly derisk new methodologies and frameworks that can move the field forward
  • Ensure these are externally valid, connecting with our threat models and helping us better predict risk
  • Publish these as useful artifacts (datasets, environments, papers, model organisms) that the field can build on
  • Streamline methodologies for use in evaluation sprints or live dashboards
Evaluation Sprints and Iteration
  • As new models are developed, partner with labs to provide external oversight so that we have the ability to "sound the alarm" if risk levels are unacceptably high
  • Develop new techniques on the fly to deal with unexpected capabilities, behaviors or features of models
  • Spot subtle methodological flaws or missing evidence, ensuring our evaluations are trustworthy and rigorous
  • Draw conclusions about overall levels of risk, and communicate these clearly
  • Anticipate the methodologies and artifacts we’ll need to assess risk from future generations of models, and…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary