×
Register Here to Apply for Jobs or Post Jobs. X

Senior Software Engineer in Test; AI Agentic Systems

Job in Lehi, Utah County, Utah, 84043, USA
Listing for: Collective Health
Full Time position
Listed on 2026-06-04
Job specializations:
  • Software Development
    AI Engineer, Data Scientist
Salary/Wage Range or Industry Benchmark: 99200 - 148800 USD Yearly USD 99200.00 148800.00 YEAR
Job Description & How to Apply Below
Position: Senior Software Engineer in Test (AI Agentic Systems)

Senior Software Engineer in Test (AI Agentic Systems)

Lehi, UT

At Collective Health, we’re transforming how employers and their people engage with their health benefits by seamlessly integrating cutting‑edge technology, compassionate service, and world‑class user experience design.

This is not a traditional QA role
. You will be the quality owner for an LLM‑based multi‑agent pipeline that autonomously adjudicates health insurance claims for self‑funded plan sponsors. You are building a Three‑Tier Evaluation Framework to ensure our Gemini‑powered agents reason correctly, call tools accurately, and produce DOL‑ready outcomes.

You will work at the intersection of Vertex AI, healthcare compliance, and high‑scale data engineering. Your work directly determines whether claims are paid correctly and whether the company can withstand a Department of Labor (DOL) or state DOI audit. The stakes are real, the domain is hard, and the problems are genuinely novel.

What you’ll do:
  • Outcome Evaluation (The "What")
    • Golden Set Governance:
      Build and maintain a versioned library of "Grounding Data" results by working with senior claims examiners to define "Ground Truth."
    • Model‑as‑a‑Judge Automation:
      Design automated "LLM‑grading‑LLM" workflows using custom rubrics to score factual grounding and policy compliance.
    • Semantic Assertion Framework:
      Develop testing libraries that move beyond string matching to validate semantic equivalence and numerical accuracy in agent outputs.
  • Trajectory Evaluation (The "How")
    • Function‑Call Auditing:
      Use Vertex AI traces to programmatically verify that mandatory tools (via MCP) were invoked with correct arguments.
    • Orchestration Logic Validation:
      Assert that agents respect defined priorities across the four architectural layers:
      Data & Knowledge, Orchestration, Agentic Reasoning, and Tooling.
    • Reasoning Trace Auditing:
      Ensure every autonomous decision is traceable to a specific SOP sentence and a live API data point.
  • Continuous Automated Regression (The "Always")
    • CI/CD Integration:
      Every prompt or model update in Vertex AI Prompt Management must trigger an automated regression run.
    • Auto‑SxS:
      Own the automated pairwise comparison process to detect logic drift between "New" and "Production" agent versions.
    • Mocking & Resilience:
      Build a Vertex AI/ADK mocking layer to simulate model responses, allowing for thousands of logic tests in seconds with zero API costs.
To be successful in this role, you’ll need:
  • Required Skills (The Core Bar)
    • Python SDET Expertise
      :
      Expert in Python and pytest
      , specifically building custom mocking frameworks for external APIs (
      Vertex AI/ADK
      ).
    • AI/LLM Observability
      :
      Hands‑on experience with Vertex AI Experiments
      , Auto‑SxS
      , and Cloud Logging for trace analysis.
    • Data Literacy
      :
      Expert‑level SQL (Big Query) and Pandas skills to "diff" massive datasets and identify adjudication discrepancies.
    • Prompt Engineering for QA
      :
      Ability to analyze "System Instructions" and refine prompts based on failed test cases to close logic gaps.
    • Architectural Testing
      :
      Experience testing multi‑layer systems involving RAG (
      Vertex AI Search
      ), state management (
      Lang Graph
      ), and function calling.
  • Preferred Skills (The "Nice‑to‑Haves")
    • Healthcare/Claims Domain:
      Familiarity with claims adjudication concepts (pend reason codes, COB, eligibility, stop‑loss).
    • Compliance Knowledge:
      Understanding of HIPAA/PHI handling and writing test evidence for regulatory bodies (DOL/DOI).
    • Human‑in‑the‑Loop Testing:
      Experience in "Shadow Mode" monitoring—comparing agent decisions against human expert (MCA) baselines.
Pay Transparency Statement

This is a hybrid position based out of our Lehi office
, with the expectation of being in office at least two weekdays per week
. #LI-hybrid

The actual pay rate offered within the range will depend on factors including geographic location, qualifications, experience, and internal equity. In addition to the salary, you will be eligible for 115000 stock options and benefits like health insurance, 401k, and paid time off. Learn more about our benefits at  .

Lehi, UT Pay Range

$99,200 - $148,800 USD

Why Join Us?
  • Mission‑driven culture that values innovation, collaboration, and a commitment to excellence in…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary