Director, Agentforce Testing Center Engineering
Listed on 2026-02-16
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
Overview
To get the best candidate experience, please consider applying for a maximum of 3 roles within 12 months to ensure you are not duplicating efforts.
Job Category:
Software Engineering
Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn't a buzzword - it's a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all.
Ready to level-up your career at the company leading workforce transformation in the agentic era? You're in the right place! Agentforce is the future of AI, and you are the future of Salesforce.
Opportunity DescriptionWe're Salesforce, the Customer Company, inspiring the future of business with AI+ Data +CRM. Leading with our core values, we help companies across every industry blaze new trails and connect with customers in a whole new way. And, we empower you to be a Trailblazer, too - driving your performance and career growth, charting new paths, and improving the state of the world.
If you believe in business as the greatest platform for change and in companies doing well and doing good - you've come to the right place.
We are looking for a technical leader who understands that building an AI agent is only 10% of the work-the real engineering challenge is measuring it. We need a thought leader who can solve the "problem nobody talks about": evaluating non-deterministic agentic systems in production. You will lead the team responsible for defining what "good" looks like for agents, moving beyond basic accuracy to rigorous evals that bridge agent spec's to business outcomes.
You will thread together Applied Science (defining metrics, curation of golden datasets, establishing ground truth) and Product Engineering (shipping software).
- Build the "Evaluation Core":
Lead the engineering of a scalable evaluation platform that runs in parallel with agent execution. - Thread Science & Engineering:
Operationalize applied science by turning theoretical benchmarks into production regression tests and bring about a discipline of eval driven development - Thought Leadership:
Act as the internal SME for AI testing. Educate cross-functional partners (Product, UX, ML) on the difference between stochastic AI behavior and traditional deterministic software - You are an Engineering leader who can lead the group through technical leadership, process management, maintain a good discipline of high quality code delivery aided with AI tools as necessary.
- You are a People leader who ensures teams have clear priorities and adequate resources. You are a multiplier and have a passion for team and team members success providing technical guidance, career development, and mentoring.
- Specialized Agent Evaluation
Experience:
You have specific experience building evaluation harnesses for LLMs or Agents - Applied Science & Engineering Hybrid:
You have a track record of managing "Research Engineering" or "Applied Science" teams where you had to operationalize vague scientific goals into shipping code. You are comfortable curating "Golden Sets" of data and building custom benchmarks from scratch. - Deep Knowledge of Eval Methodologies:
You are fluent in modern evaluation techniques, including:
- LLM-as-a-Judge:
Validating judges against human ground truth to prevent self-bias.
- Behavioral Analysis:
Evaluating how an agent thinks (Reasoning Traces/Chain of Thought), not just the final output.
- Production-Grade AI
Experience:
You have shipped AI products where you had to manage real-world constraints like token budgets, inference latency, and cost-normalized accuracy. Pragmatic orientation to building ML solutions that work in production at scale - Familiarity with academic and industry benchmarks and their limitations in a business environment.
- Experience building simulation environments (mock APIs, virtual users) to stress-test agents safely before deployment.
- Experience with Data engineering, specifically around…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).