Research Lead Job Berkeley Springs West Virginia USA,Research/Development

Location: Berkeley Springs

FAR.AI is hiring a Research Lead to develop and lead a research agenda that reduces catastrophic risks from advanced AI. You'll build and lead a team executing this agenda - setting research direction, mentoring Members of Technical Staff to scale your vision, and remaining hands-on enough to write code and run experiments yourself. What counts is whether AI labs and governments actually change how they act;

publications are useful but aren't the measure. Beyond your team, you can shape FAR.AI's broader work by directing millions of dollars in grants to external researchers extending your agenda, convening the people who can act on it, and influencing our independent testing and advising of AI companies and governments. This role suits you if you want high autonomy in an impact-driven environment, pursuing empirically grounded, scalable ML safety work.

About Us

FAR.AI is a non-profit AI research institute working to ensure advanced AI is safe and beneficial for everyone. Our mission is to facilitate breakthrough AI safety research, advance global understanding of AI risks and solutions, and foster a coordinated global response.

Since our founding in July 2022, we've grown to 40+ staff, published 40+ academic papers, and convened leading AI safety events. Our work is recognized globally, with publications at premier venues such as NeurIPS, ICML, and ICLR, and features in the Financial Times, Nature News and MIT Technology Review. We conduct pre-deployment testing on behalf of frontier developers such as OpenAI and independent evaluations for governments including the EU AI Office.

We help steer and grow the AI safety field through developing research roadmaps with renowned researchers such as Yoshua Bengio; running FAR.Labs, an AI safety-focused co-working space in Berkeley housing 40 members; and supporting the community through targeted grants to technical researchers.

About FAR.Research

We explore promising research directions in AI safety and scale up only those showing a high potential for impact. Once the core research problems are solved, we work to scale them to a minimum viable prototype, demonstrating their validity to AI companies and governments to drive adoption.

Our recent and ongoing research includes:

Adversarial Robustness: working to rigorously solve security problems through building a science of security and robustness for AI, from demonstrating superhuman systems can be vulnerable, to scaling laws for robustness and jail breaking constitutional classifiers.

Mechanistic Interpretability: finding issues with Sparse Autoencoders, probing deception using Among Us, understanding learned planning in Soko Ban, and interpretable data attribution.

Red-teaming: conducting pre- and post-release adversarial evaluations of frontier models (e.g. Claude 4 Opus, ChatGPT Agent, GPT-5); developing novel attacks to support this work.

Evals: developing evaluations for new threat models, e.g. persuasion and tampering risks.

Mitigating AI deception: studying when lie detectors induce honesty or evasion, and developing approaches to deception and sandbagging.

We are particularly looking to add Research Leads in the following pod shapes:

Applied Interpretability - using interpretability to tackle concrete safety problems (better probes, backdoor detection, deception monitoring), aiming for fast feedback loops, often in collaboration with our other pods. A new pod, greenfield.
Scalable Oversight / Alignment - methods that keep oversight robust as models become more capable than their supervisors: recursive reward modeling, debate, weak-to-strong generalization, process-based supervision.
Adversarial Robustness -extending our independent-testing work into deployed-system protection: better safety guardrails, pre-training safety interventions (initially CBRN misuse, especially for open-weight models), backdoor detection and mitigation, realistic cybersecurity evaluations, and loss-of-control deception evaluations.
Auditing / Evals - safety and alignment auditing: evaluation awareness (construct validity, safety-relevance, hyper-realistic evals), CoT monitorability and faithfulness training, black-box monitoring…