V&V Engineer – AI - Driven Testing & Validation
Listed on 2026-05-20
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Data Scientist
Contract (6 months 23 days)
Published 18 days ago
Lang Graph
pytorch
Communication & analytical skills
data validation
cross - functional collaboration
Test Automation & Frameworks
CI/CD integration
amazon web services
Key Responsibilities:AI/ML & LLM Development/Validation:
Lead end-to-end quality engineering for enterprise AI applications, including LLM-powered products, RAG pipelines, and agentic workflows.
Design and execute prompt validation strategies, evaluating LLM responses for accuracy, semantic relevance, hallucination risk, and safety compliance.
Build automated evaluation pipelines for AI model outputs using metrics such as BLEU, ROUGE, embedding-based similarity, precision, recall, and F1-score.
Validate agentic systems (tool use, multi-step reasoning, planner-executor workflows) for correctness, determinism, and failure mode handling.
Test Automation & Frameworks:Architect and maintain Python-based automation frameworks for AI/ML model evaluation, regression testing, and continuous model quality monitoring.
Integrate AI testing into CI/CD pipelines, enabling automated evaluation of model updates, prompt changes, and dataset revisions before release.
Develop reusable test harnesses for prompt regression, golden-set evaluation, A/B comparison of model versions, and human-in-the-loop review workflows.
Perform AI data validation across training and inference pipelines using exploratory data analysis (EDA), schema validation, and cross-validation techniques.
Conduct bias detection and fairness analysis across demographic and contextual slices to ensure responsible AI outcomes.
Drive model robustness testing, including adversarial inputs, distribution shift detection, and stress testing under edge cases.
Establish regression testing standards for retraining and fine-tuning cycles to prevent quality drift after model updates.
Partner with client AI engineers to validate solutions built using Tensor Flow, PyTorch, Lang Chain, Lang Graph, and Llama Index.
Define quality KPIs and acceptance criteria for AI features, and report quality posture to engineering and product leadership.
Mentor QA engineers on AI evaluation methodologies, ML fundamentals, and modern test automation practices.
Champion responsible AI practices, including safety, transparency, explainability, and compliance with evolving AI governance standards.
Required Qualifications:10+ years of professional experience in Quality Engineering and Test Automation, validating complex enterprise applications.
Proficient in validating AI/ML systems, including Generative AI and LLM-based applications.
Strong proficiency in Python and experience building automation frameworks from the ground up.
Practical experience with prompt validation, agentic workflow testing, and AI model evaluation.
Working knowledge of evaluation metrics: BLEU, ROUGE, embedding similarity, precision, recall, F1-score, and human-evaluation methodologies.
Experience with AI/ML frameworks and ecosystems:
Tensor Flow, PyTorch, Lang Chain, Lang Graph, and Llama Index.
Solid understanding of data validation techniques: EDA, schema validation, cross-validation, and statistical analysis.
Experience integrating automated testing into CI/CD pipelines (e.g., Git Hub Actions, Jenkins, Git Lab CI, Azure Dev Ops).
Familiarity with bias detection, fairness assessment, and AI safety evaluation techniques.
Preferred Qualifications:Experience with vector databases, retrieval-augmented generation (RAG), and embedding pipelines.
Background in MLOps tooling such as MLflow, Weights & Biases, or similar experiment tracking platforms.
Exposure to LLM observability and evaluation tools (e.g., Lang Smith, Ragas, Deep Eval, Tru Lens).
Familiarity with cloud AI services on AWS, Azure, or GCP (Bedrock, Azure OpenAI, Vertex AI).
Knowledge of AI governance frameworks, model cards, and emerging AI regulatory standards.
Bachelor's or Master's degree in Computer Science, Data Science, or a related technical field.
The pay range that the employer in good faith reasonably expects to pay for this position is $30.05/hour - $46.95/hour. Our benefits include medical, dental, vision and retirement benefits.
We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic. Qualified applicants with arrest or conviction records will be considered for employment in accordance with applicable law, including the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.
Unincorporated LA County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: client provided property, including hardware (both of which may include data) entrusted to you from theft, loss or damage; return all…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).