Policy Manager, Harmful Persuasion
Listed on 2026-02-06
-
Security
Information Security, Cybersecurity
About Anthropic
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems.
AboutThe Role
As a Safeguards Product Policy Manager for Harmful Persuasion, you will be responsible for developing, refining, and maintaining policies that prevent the misuse of AI systems for influence operations, harmful manipulation, and fraudulent behaviors this role, you will function as the policy owner for a range of harmful persuasion risks and shape the policy frameworks across several policy areas including: election integrity, information integrity and fraud.
As a member of the Safeguards team, your initial focus will be on translating the Harmful Persuasion risk framework into clear, enforceable policies, ensuring policy language addresses emerging threats identified by partner teams, and establishing guidelines that enable consistent enforcement decisions. This role may expand to include emerging manipulation vectors as AI capabilities advance. Safety is core to our mission and you ll help ensure our policies prevent our products from being weaponized to undermine civic processes, exploit vulnerable populations, or degrade information ecosystems.
- Important context for this role:
In this position you may be exposed to and engage with explicit content spanning a range of topics, including those of a sexual, violent, or psychologically disturbing nature.
- Develop and maintain comprehensive policy frameworks for harmful persuasion risks, especially in the context of election integrity, influence operations, and fraud
- Design clear, enforceable policy language that can be consistently applied by enforcement teams and translated into technical detection requirements
- Design and oversee execution of evaluations to assess the model’s capability to leverage, produce and execute deceptive and harmful persuasive techniques
- Write and refine external-facing Usage Policy language that clearly communicates policy violations and restrictions to users and external stakeholders
- Develop training guidelines, assessment rubrics, and evaluation protocols
- Validate enforcement decisions and automated assessments, providing qualitative analysis and policy guidance on complex edge cases
- Coordinate with external experts, civil society organizations, and academic to gather feedback on policy clarity and coverage
- Provide policy input on UX design for interventions, ensuring user-facing elements align with policy intent and minimize friction for legitimate use
- Contribute to model safety improvements in conjunction with the Fine tuning team
- Support regulatory compliance efforts including consultations related to the EU AI Act and other emerging AI governance frameworks
- Function as an escalation point for complex harmful persuasion cases requiring expert policy judgment
- 5+ years of experience in policy development, trust & safety policy, or platform policy with working experience across the following: election integrity, fraud/scams, coordinated inauthentic behavior, influence operations, or misinformation
- General knowledge of the global regulatory landscape around election integrity, platform regulation, and digital services accountability
- Strong policy writing skills with the ability to translate complex risk frameworks into clear, enforceable guidelines
- Experience designing policies and workflows that enable both clear human enforcement decision-making and technical implementation in ML classifiers and detection pipelines
- Strong collaboration skills and extensive experience partnering effectively with Engineering, Data Science, Legal, and Policy teams on cross-functional initiatives
- Excellent written and verbal communication skills, with the ability to explain complex manipulation tactics and policy rationales to diverse audiences
- Strong familiarity in election integrity, political psychology, information integrity, and democratic resilience research
- Knowledge of persuasion theory, influence tactics, cognitive biases, and psychological manipulation techniques
- Experience working with EU institutions, regulatory bodies, or policy organizations on AI governance or digital platform regulation
- Experience conducting adversarial testing, red teaming, or vulnerability assessments for AI systems or platforms
- Familiarity with generative AI capabilities and understanding of how LLMs can be used for personalized persuasion, social…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).