AI Quality Engineer
Toronto, Ontario, C6A, Canada
Listed on 2026-06-21
-
Software Development
AI QA / Validation Engineer, AI Engineer (Applied/Software), Software Testing
About Rootly
At Rootly, we are on a mission to be the go-to way companies respond when things go wrong, helping every organization be more reliable. We build an industry‑leading incident management platform that enables companies worldwide to resolve incidents consistently and quickly. We are carving an entirely new +$B segment ourselves and need incredible talent to achieve this ambitious goal together.
Customers love Rootly. Some of the fastest growing companies around the world such as NVIDIA, Figma, Canva, Tripadvisor, Squarespace and more rely on Rootly to power their critical incident management process. They obsess over our delightful enterprise‑ready platform and unique partnership model. See why our customers have reviewed us 5 stars on G2.
Investors love Rootly. We are backed by respected funds in the world from Y Combinator to operators like the CTO of Dropbox and Git Hub. We conduct monthly financial reviews as a team so everyone has a pulse on the health of the business and publish what we are building in our weekly changelog.
AI Quality Engineer– Role Overview
Rootly is building the AI‑native future of incident management, and we need someone who can push our AI to its limits before our customers do. As an AI Quality Engineer, you will own the evaluation and optimization of Rootly’s agentic AI features – designing test scenarios, running adversarial prompts, interpreting outputs, and working directly with engineering and product to close the loop on performance.
WhatYou’ll Do
- Design and execute prompt‑based test scenarios that cover happy paths, edge cases, and adversarial inputs across Rootly’s agentic AI features.
- Evaluate AI outputs for accuracy, relevance, consistency, and alignment with expected workflow behaviour.
- Build and maintain an evaluation framework; structured test libraries, scoring rubrics, and regression suites to track AI performance over time.
- Identify failure modes, hallucinations, reasoning gaps, and unexpected agent behaviours; document findings and work with engineers to resolve them.
- Partner with Product and Engineering on new AI feature releases, contributing to acceptance criteria and quality gates before launch.
- Define and track quality metrics (accuracy rates, failure frequency, regression trends) and report findings to stakeholders.
- Stay current on LLM evaluation techniques, prompt engineering best practices, and agentic testing methodologies.
- +5 years in QA, product operations, AI/ML evaluation, or a closely related role.
- Hands‑on experience testing or evaluating LLM‑powered or agentic AI products.
- Strong prompt engineering instincts – you understand how wording, context, and structure affect model behaviour.
- Comfortable writing scripts or working with evaluation tools (Python a plus; not required to be a full‑stack engineer).
- Sharp analytical thinking; you can spot a subtle reasoning failure and articulate exactly why it’s a problem.
- Clear written communicator; able to translate AI behaviour findings for both technical and non‑technical audiences.
- Familiarity with incident management, Dev Ops, or IT operations workflows is a strong asset.
- Experience with evaluation frameworks (e.g. Lang Smith, Prompt Flow, Braintrust, or similar).
- Exposure to red‑teaming or adversarial testing of AI systems.
- Comfortable writing E2E tests with Playwright.
- Background working at a B2B SaaS or developer‑tools company.
- Familiar with mobile app testing (iOS/Android).
We’re not just another startup. We’re building something category‑defining and want teammates who crave ownership, love solving hard problems, and thrive in a high‑bar, high‑impact environment.
Benefits- Competitive compensation and early equity in a fast‑growing, venture‑backed company.
- Comprehensive medical, dental, and vision coverage.
- 3 weeks of vacation, plus unlimited sick and mental health days, and a company‑wide end‑of‑year shutdown to recharge.
- $500 stipend for home office setup.
- Unlimited token usage and access to AI tools.
- A fast‑moving, high‑impact environment where your leadership and ideas directly shape the future of the company.
If this sounds like the kind of challenge and opportunity you’re looking for, apply now and let’s build something great together.
Rootly is an equal opportunity employer. We aim to create an environment where every team member at Rootly feels like they belong so they can have a greater impact on our business and customers. We do not discriminate on the basis of race, religion, colour, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: