Policy Manager,Harmful Persuasion Job New York New York USA,Security

Location: New York

About Anthropic

Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.

About

The Role

As a Safeguards Product Policy Manager for Harmful Persuasion, you will be responsible for developing, refining, and maintaining policies that prevent the misuse of AI systems for influence operations, harmful manipulation, and fraudulent behaviors this role, you will function as the policy owner for a range of harmful persuasion risks and shape the policy frameworks across several policy areas including: election integrity, information integrity and fraud.

As a member of the Safeguards team, your initial focus will be on translating the Harmful Persuasion risk framework into clear, enforceable policies, ensuring policy language addresses emerging threats identified by partner teams, and establishing guidelines that enable consistent enforcement decisions. This role may expand to include emerging manipulation vectors as AI capabilities advance. Safety is core to our mission and you ll help ensure our policies prevent our products from being weaponized to undermine civic processes, exploit vulnerable populations, or degrade information ecosystems.

Important context for this role:
In this position you may be exposed to and engage with explicit content spanning a range of topics, including those of a sexual, violent, or psychologically disturbing nature.

Responsibilities

Develop and maintain comprehensive policy frameworks for harmful persuasion risks, especially in the context of election integrity, influence operations, and fraud
Design clear, enforceable policy language that can be consistently applied by enforcement teams and translated into technical detection requirements
Design and oversee execution of evaluations to assess the model’s capability to leverage, produce and execute deceptive and harmful persuasive techniques
Write and refine external-facing Usage Policy language that clearly communicates policy violations and restrictions to users and external stakeholders
Develop training guidelines, assessment rubrics, and evaluation protocols
Validate enforcement decisions and automated assessments, providing qualitative analysis and policy guidance on complex edge cases
Coordinate with external experts, civil society organizations, and academic to gather feedback on policy clarity and coverage
Provide policy input on UX design for interventions, ensuring user-facing elements align with policy intent and minimize friction for legitimate use
Contribute to model safety improvements in conjunction with the Fine tuning team
Support regulatory compliance efforts including consultations related to the EU AI Act and other emerging AI governance frameworks
Function as an escalation point for complex harmful persuasion cases requiring expert policy judgment

You May Be a Good Fit If You Have

5+ years of experience in policy development, trust & safety policy, or platform policy with working experience across the following: election integrity, fraud/scams, coordinated inauthentic behavior, influence operations, or misinformation
General knowledge of the global regulatory landscape around election integrity, platform regulation, and digital services accountability
Strong policy writing skills with the ability to translate complex risk frameworks into clear, enforceable guidelines
Experience designing policies and workflows that enable both clear human enforcement decision-making and technical implementation in ML classifiers and detection pipelines
Strong collaboration skills and extensive experience partnering effectively with Engineering, Data Science, Legal, and Policy teams on cross-functional initiatives
Excellent written and verbal communication skills, with the ability to explain complex manipulation tactics and policy rationales to diverse audiences

Preferred Qualifications

Strong familiarity in election integrity, political psychology, information integrity, and democratic resilience research
Knowledge of persuasion theory, influence tactics, cognitive biases, and psychological manipulation techniques
Experience working with EU institutions, regulatory bodies, or policy organizations on AI governance or digital platform regulation
Experience conducting adversarial testing, red teaming, or vulnerability assessments for AI systems or platforms
Familiarity with generative AI capabilities and understanding of how LLMs can be used for personalized persuasion, social…


Increase/decrease your Search Radius (miles)



Job Posting Language

Policy Manager, Harmful Persuasion