×
Hier anmelden um sich kostenlos auf Stellen zu bewerben oder Stellenanzeigen aufzugeben. X

Senior Product Manager, AI Agents Testing

in 10115, Berlin, Berlin, Deutschland
Unternehmen: Zendesk GmbH (Germany)
Vollzeit position
Verfasst am 2026-06-15
Berufliche Spezialisierung:
  • Software Entwicklung
    Künstliche Intelligenz Ingenieur
Gehalts-/Lohnspanne oder Branchenbenchmark: 100000 - 125000 EUR pro Jahr EUR 100000.00 125000.00 YEAR
Stellenbeschreibung

Zendesk AI Agents are fully autonomous agents that resolve customer issues end-to-end — reasoning over knowledge bases, executing multi-step procedures, taking actions via APIs, and handing off to humans when needed. They operate across messaging, email, and voice channels, handling millions of conversations for brands like Liberty London, Unity, and Motel Rocks. As these agents grow more capable and more autonomous, the stakes of every deployment decision increase: a misconfigured procedure, a hallucinated response, or a broken escalation path can erode customer trust ay, the admins who configure and manage these agents — CX managers, bot builders, operations leads — lack the tools to confidently test agent behavior before going live, measure quality in production, or experiment with changes safely.

You’ll own the end-to-end product strategy for our Testing & Observability suite — the layer that lets admins simulate conversations against their real knowledge and procedures, score agent quality across accuracy, tone, and policy adherence, run A/B experiments on agent behavior, and catch regressions before they reach end users. This is a strategic opportunity that directly determines whether enterprises can trust and scale agentic AI in their customer service operations.

Key Responsibilities
  • Own product strategy and roadmap for AI agent testing – simulation, quality scoring, experimentation, regression detection, and conversation tracing.
  • Ship testing as an integrated experience embedded in the builder and deployment flow.
  • Define how simulation works end-to-end: scenario generation from real conversation patterns, automated pass/fail evaluation, and results that point admins to exactly what broke and where.
  • Build the experimentation layer – A/B testing of agent behavior, staged rollouts with statistical rigor, safe iteration on tone and resolution strategies.
  • Design a pre‑publish readiness gate that gives admins a quantified view of risk before every deployment – specific issues, coverage gaps, comparison to current production behavior.
  • Partner with ML, QA, and platform teams on scoring methodology, simulation infrastructure, and tracing architecture.
  • Make all of this usable by non‑technical admins – CX managers, bot builders, operations leads who need answers without writing code or filing engineering tickets.
Required Qualifications
  • Several years of product management experience, with 2+ years building for non‑technical users in complex technical domains (QA tooling, no‑code platforms, admin consoles, workflow builders) in B2B SaaS.
  • Experience shipping AI/ML products where evaluation and reliability were real concerns, not afterthoughts.
  • Understand why traditional testing doesn’t work for LLM‑based systems and have opinions about what does.
  • Ability to ship platform capabilities through user‑facing product surfaces – you don’t just build infrastructure, you make it usable.
  • Experience integrating acquired or adjacent products into a unified experience – combining capabilities from different teams, codebases, or organizations into something that feels like one product.
  • Track record coordinating across 3+ engineering teams and multiple departments to deliver one coherent product experience.
Bonus Qualifications
  • Experience building simulation, synthetic data, or automated testing products.
  • Background in conversational AI, chatbot platforms, or customer service technology.
  • Familiarity with LLM evaluation approaches – human‑in‑the‑loop scoring, automated rubrics, AI‑as‑judge.
  • Experience with experimentation infrastructure – A/B testing, staged rollouts, feature flagging at scale.
  • Experience turning internal prototypes into customer‑facing products.
Success in the Role

Testing becomes part of how customers build and deploy agents – not something they do separately, but part of the flow. Customers can quantify whether their agent is ready to go live, and catch regressions before end‑users hit them. Automated resolution rates improve because customers can actually diagnose and fix quality issues instead of guessing. The testing platform becomes a shared capability used beyond AI Agents – consumed by other product teams…

Stellen-Anforderungen
10+ Jahre Berufserfahrung
Bitte beachten Sie, dass derzeit keine Bewerbungen aus Ihrem Zuständigkeitsbereich für diese Stelle über diese Jobseite akzeptiert werden. Die Präferenzen der Kandidaten liegen im Ermessen des Arbeitgebers oder des Personalvermittlers und werden ausschließlich von diesen bestimmt.
Um nach Stellen zu suchen, sie anzusehen und sich zu bewerben, die Bewerbungen aus Ihrem Standort oder Land akzeptieren, klicken Sie hier, um eine Suche zu starten:
 
 
 
Suchen Sie hier nach weiteren Stellen:
(nach Beruf, Fähigkeit)
Standort
Suchradius erweitern (Meilen)
0
200
Filter
Mindest-Bildungsgrad für die Stelle
Mindest-Berufserfahrung für die Stelle
Veröffentlicht in den letzten:
Gehalt