QA Engineer Job Charlotte area,North Carolina USA,Software Development

The Sr. AI Test Engineer is responsible for designing, building, and running the tests that prove AI systems behave as intended in production. Working under the direction of the AI Test Lead Architect, this role translates testing methodology into working test suites, evaluation harnesses, and quality gates for deterministic and non-deterministic systems (ML models, GenAI, and LLM applications).

The role is cloud-native by design: AI workloads are tested where they run, requiring deep expertise in one major cloud platform-AWS (Sage Maker, Bedrock), GCP (Vertex AI), or Azure (Azure ML, Azure OpenAI)-with quality embedded directly into CI/CD and MLOps pipelines. The engineer partners closely with data scientists, ML engineers, and product teams to shift quality left and catch model, data, and behavioral issues before they reach users.

While this is a senior individual-contributor role, the Sr. AI Test Engineer is expected to mentor other testers, set technical standards for AI quality, and act as a trusted technical voice in client-facing conversations.

Roles and Responsibilities

AI Testing & Evaluation

- Design and implement test strategies for deterministic and non-deterministic AI systems (ML models, GenAI, LLMs), focusing on probabilistic correctness rather than simple pass/fail assertions.

- Build and maintain evaluation harnesses covering offline (benchmark datasets, golden sets) and online (production monitoring, A/B) evaluation.

- Validate LLM and GenAI behavior-hallucination, groundedness, prompt robustness, toxicity, and prompt-injection resilience-using automated and human-in-the-loop methods.

- Test for model quality and risk across accuracy, drift, robustness, bias, fairness, and explainability.

- Collect and analyze model quality metrics including Precision, Recall, F1, and Confusion Matrix, and translate results into clear quality signals.

Cloud & Platform Testing (AWS, GCP, or Azure)

- Test AI/ML workloads deployed on your primary cloud platform-AWS (Sage Maker, Bedrock), GCP (Vertex AI), or Azure (Azure ML, Azure OpenAI)-validating model endpoints, inference performance, and scaling behavior.

- Validate data pipelines, feature stores, and model artifacts for quality, lineage, and consistency across cloud environments.

- Conduct performance, load, and latency testing of model-serving endpoints and GenAI APIs under realistic and adversarial conditions.

- Apply cloud-native testing patterns and infrastructure-as-code to make AI test environments reproducible.

Automation, Accelerators & Tooling

- Build reusable automation frameworks for AI regression testing, GenAI prompt validation, dataset validation, and drift detection.

- Establish AI quality gates embedded in CI/CD and MLOps workflows so model and data quality is verified on every change.

- Develop and evolve AI testing accelerators across SDLC integration, automation, and runtime monitoring/observability.

- Implement automated reporting that surfaces model quality, drift, and risk indicators to engineering and delivery teams.

Collaboration, Delivery & Client Engagement

- Partner with data science, ML engineering, and product teams to embed quality early and continuously (shift-left).

- Apply AI testing approaches across Agile, Waterfall, and hybrid delivery models.

- Engage confidently with technical client stakeholders; support AI quality assessments, demos, and proofs of value.

- Mentor junior testers and set technical standards for AI quality within the delivery team.

Skills Required

Core Skills & Experience

- Hands-on experience testing AI/ML and GenAI systems, including evaluation of training and inference, drift, bias, and explainability.

- Strong test automation skills with a programming language commonly used in AI (Python strongly preferred).

- Demonstrated experience building test or evaluation frameworks for ML or LLM systems.

- Familiarity with collecting and analyzing Precision, Recall, F1 Score, and Confusion Matrix.

- Experience integrating automated tests and quality gates into CI/CD and MLOps pipelines.

Technical & Platform Expertise

- Deep, hands-on expertise in one major cloud platform-AWS, GCP, or Azure-and its AI/ML services (e.g., Sage Maker and Bedrock; Vertex AI; or Azure ML and Azure OpenAI). Familiarity with a second cloud is a plus.

- Test automation frameworks and data validation strategies.

- Monitoring, observability, and AI system reporting.

- Shift-left testing and continuous quality engineering.

- Familiarity with AI evaluation tooling (e.g., Deep Eval, Ragas, Lang Smith/Langfuse, Evidently, MLflow) is a strong plus.

Communication & Collaboration

- Clear communication with both technical and non-technical audiences.

- Consultative mindset focused on outcomes, risk reduction, and business value.

- Comfortable working in open, dynamic, and collaborative team environments.

Other Skills and Traits

- Strong analytical, problem-solving, and systems-thinking abilities.

- Self-starter with a proactive,…