US_East | Software Developer - Testing Tools/Automation/Performance

Position: US_East | Software Developer - Testing Tools/Automation/Performance _L2

V&V Engineer – AI-Driven Testing & Validation

Location:

Plano, TX

Key Responsibilities

Lead end-to-end quality engineering for enterprise AI applications, including LLM-powered products, RAG pipelines, and agentic workflows.
Design and execute prompt validation strategies, evaluating LLM responses for accuracy, semantic relevance, hallucination risk, and safety compliance.
Build automated evaluation pipelines for AI model outputs using metrics such as BLEU, ROUGE, embedding-based similarity, precision, recall, and F1-score.
Validate agentic systems (tool use, multi-step reasoning, planner-executor workflows) for correctness, determinism, and failure mode handling.
Architect and maintain Python-based automation frameworks for AI/ML model evaluation, regression testing, and continuous model quality monitoring.
Integrate AI testing into CI/CD pipelines, enabling automated evaluation of model updates, prompt changes, and dataset revisions before release.
Develop reusable test harnesses for prompt regression, golden-set evaluation, A/B comparison of model versions, and human-in-the-loop review workflows.
Perform AI data validation across training and inference pipelines using exploratory data analysis (EDA), schema validation, and cross-validation techniques.
Conduct bias detection and fairness analysis across demographic and contextual slices to ensure responsible AI outcomes.
Drive model robustness testing, including adversarial inputs, distribution shift detection, and stress testing under edge cases.
Establish regression testing standards for retraining and fine-tuning cycles to prevent quality drift after model updates.
Partner with client AI engineers to validate solutions built using Tensor Flow, PyTorch, Lang Chain, Lang Graph, and Llama Index.
Define quality KPIs and acceptance criteria for AI features, and report quality posture to engineering and product leadership.
Mentor QA engineers on AI evaluation methodologies, ML fundamentals, and modern test automation practices.
Champion responsible AI practices, including safety, transparency, explainability, and compliance with evolving AI governance standards.

Required Qualifications

10+ years of professional experience in Quality Engineering and Test Automation, validating complex enterprise applications.
Proficient in validating AI/ML systems, including Generative AI and LLM-based applications.
Strong proficiency in Python and experience building automation frameworks from the ground up.
Practical experience with prompt validation, agentic workflow testing, and AI model evaluation.
Working knowledge of evaluation metrics: BLEU, ROUGE, embedding similarity, precision, recall, F1-score, and human-evaluation methodologies.
Experience with AI/ML frameworks and ecosystems:
Tensor Flow, PyTorch, Lang Chain, Lang Graph, and Llama Index.
Solid understanding of data validation techniques: EDA, schema validation, cross-validation, and statistical analysis.
Experience integrating automated testing into CI/CD pipelines (e.g., Git Hub Actions, Jenkins, Git Lab CI, Azure Dev Ops).
Familiarity with bias detection, fairness assessment, and AI safety evaluation techniques.
Bachelor's or Master's degree in Computer Science, Data Science, or a related technical field.

Preferred Qualifications

Experience with vector databases, retrieval-augmented generation (RAG), and embedding pipelines.
Background in MLOps tooling such as MLflow, Weights & Biases, or similar experiment tracking platforms.
Exposure to LLM observability and evaluation tools (e.g., Lang Smith, Ragas, Deep Eval, Tru Lens).
Familiarity with cloud AI services on AWS, Azure, or GCP (Bedrock, Azure OpenAI, Vertex AI).
Knowledge of AI governance frameworks, model cards, and emerging AI regulatory standards.

#J-18808-Ljbffr

US_East | Software Developer - Testing Tools​/Automation​/Performance _L

US_East | Software Developer - Testing Tools/Automation/Performance _L