AI Quality Engineer Job Vancouver area,BC Canada,Software Development

Key Responsibilities

Build and maintain automated tests for AI agent workflows, APIs, tools, telemetry backed analysis, remediation flows, and ticketing behavior.
Design evaluation suites for LLM and agentic behavior, including expected-answer checks, rubric-based grading, regression datasets, tool-call validation, and safety/approval checks.
Use or help implement evaluation frameworks such as Pydantic Evals / Pydantic AI, Strands Evals, Lang Smith, Deep Eval, Ragas, promptfoo, or similar tools.
Validate multi-turn support scenarios, clarification flows, knowledge retrieval, script/remediation recommendations, escalation paths, and failure handling.
Test on-device agent behavior where needed, including Windows service/tray behavior, telemetry collection, anomaly detection, local remediation handoff, logs, and resource impact.
Debug quality issues directly by reading logs, tracing requests, reproducing failures, and making small code/test changes without heavy engineering hand-holding.
Partner with engineering and product to define release gates, quality metrics, evaluation rubrics, and confidence thresholds for pilot readiness.
Contribute to CI quality checks, test fixtures, mocked integrations, regression suites, and test data management.
Identify risks in AI behavior, including hallucinated diagnosis, unsafe remediation suggestions, missing consent, weak ticket summaries, brittle tool use, and poor escalation behavior.

Responsibilities

4+ years of experience in QA engineering, SDET, test automation, software engineering, or similar hands‑on quality roles.
Strong Python experience, including writing production‑quality tests and debugging application code.
Experience testing backend services, APIs, async workflows, integrations, logs, and distributed systems.
Demonstrated ability to work independently in a codebase, identify root causes, and make targeted fixes or test improvements.
Experience with modern test frameworks such as pytest, unittest, Playwright, xUnit, or similar.
Comfortable testing ambiguous AI/non‑deterministic behavior using datasets, rubrics, assertions, metrics, and regression baselines.
Ability to distinguish product bugs, prompt/model behavior issues, data issues, integration failures, and test harness problems.
Strong written communication for documenting repro steps, risks, test plans, and release readiness.

Qualifications

Direct experience with Pydantic, Pydantic AI, or Pydantic Evals.
Experience testing LLM applications, agentic systems, RAG, tool calling, MCP servers, or AI assistants.
Experience building eval datasets, LLM‑as‑judge flows, deterministic evaluators, and regression dashboards.
Windows endpoint, device telemetry, Power Shell, .NET, MQTT, or on‑device agent testing experience.
Experience with IT support workflows, Service Now, ticketing, endpoint management, or enterprise support tooling.
Ability to contribute small application changes, not only test changes.

#J-18808-Ljbffr