×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

Senior​/Software AI Test Engineer, AI Engineering

Job in Santa Monica, Los Angeles County, California, 90403, USA
Listing for: TWG AI
Full Time position
Listed on 2026-05-27
Job specializations:
  • Software Development
    AI Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below
Position: Senior / Staff Software AI Test Engineer, AI Engineering

At TWG Group Holdings, LLC ("TWG Global"), we drive innovation and business transformation across a range of industries—financial services, insurance, technology, media, and sports—by leveraging data and AI as core assets. Our AI‑first, cloud‑native approach delivers real‑time intelligence and interactive business applications, empowering informed decision‑making for both customers and employees. We prioritize responsible data and AI practices, ensuring ethical standards and regulatory compliance.

Our decentralized structure enables each business unit to operate autonomously, supported by a central AI Solutions Group, while strategic partnerships with leading data and AI vendors fuel game‑changing efforts in marketing, operations, and product development. You will collaborate with management to advance our data and analytics transformation, enhance productivity, and enable agile, data‑driven decisions. By leveraging relationships with top tech startups and universities, you will help create competitive advantages and drive enterprise innovation.

At TWG Global, your contributions will support our goal of sustained growth and superior returns, as we deliver rare value and impact across our businesses.

The Role

TWG Global is seeking a Senior or Staff AI Software Engineer in Test to join our AI Engineering team building commercial‑grade AI products. This is a software engineering role focused on test automation. You won’t just write test cases, you’ll design and build the frameworks, harnesses, evaluation infrastructure, and tooling that make testing AI agents and LLM‑powered applications possible  agents are written in Lang Graph and run on Azure on the TWG side, with a parallel Vercel‑based stack on the Palantir side.

You’ll write eval sets against both, and you’ll validate the surfaces our users actually touch: iOS apps, plugins, and Chrome extensions, not just the model layer. You’ll work shoulder‑to‑shoulder with AI engineers and data scientists, contributing production‑quality code to shared repositories.

Key Responsibilities Framework and harness engineering
  • Design and build scalable, reusable test automation frameworks for AI agents, LLM‑powered applications, and underlying APIs.
  • Write clean, maintainable Python for test harnesses, eval pipelines, synthetic data generation utilities, and internal tooling.
  • Treat test code as production code: code review, type hints, documentation, library design.
Evaluation infrastructure
  • Build evaluation infrastructure for benchmarking agent performance against SOTA LLMs, competitors, and internal baselines.
  • Own regression suites, golden datasets, rubric‑based evals, and metric dashboards.
  • Build tooling for synthetic test data generation, edge‑case discovery, and adversarial testing.
Resilience and load
  • Design and run release, system, performance, and load tests against streaming, stateful, and async systems.
  • Build chaos and fault injection tooling for token expiry, connection pool exhaustion, provider failover, and cache pressure scenarios.
  • Drive contract testing across LLM providers (Bedrock, Anthropic, OpenAI) to catch parity drift.
CI/CD and observability
  • Integrate automated tests into CI/CD so every model, prompt, and code change is validated before it ships.
  • Build trace‑based assertions on Lang Graph state, tool calls, and agent decisions—debugging an agent failure means replaying graph state, not re‑running a prompt.
  • Make observability a first‑class testing surface (Lang Smith, audit logs).
Human‑in‑the‑loop and partnership
  • Implement HIL review workflows where automation alone cannot validate quality, then push the automation boundary outward.
  • Partner with AI engineers and data scientists on model evaluation, training and eval data prep, and root‑cause debugging of complex end‑to‑end failures.
  • Champion quality engineering practices across the team: code review, coverage standards, observability, reproducibility.
  • Ensure user‑centric validation so AI outputs are accurate, reliable, and meet real‑world application needs.
Requirements
  • 3‑7 years of software engineering experience, with a meaningful portion focused on test automation, SDET, or software engineering in test…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary