AI Quality Engineer Job Atlanta area,Georgia USA,Software Development

Job Description

Key Responsibilities

* Design and implement evaluation frameworks (evals) to assess LLM and agentic AI system quality, including accuracy, consistency, safety, and task completion rates.

* Build and maintain automated test pipelines for AI features, covering unit, integration, and end-to-end scenarios across agentic workflows.

* Develop tooling to detect regressions in model behavior, prompt outputs, and agent decision-making across releases.

* Define and track quality metrics for AI systems (e.g., hallucination rates, tool-use accuracy, latency, failure recovery) and surface findings clearly to stakeholders.

* Collaborate with engineers and product managers to identify edge cases, adversarial inputs, and failure modes specific to multi-step agentic pipelines.

* Contribute to prompt evaluation strategies, including red-teaming, adversarial testing, and bias/fairness assessments.

* Participate in design and code reviews with a quality-focused lens, raising concerns about testability and reliability early.

* Help define and document quality standards and best practices for AI/ML features across the team.

* Other duties as assigned.

Qualifications

Required

* Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.

* 3-5 years of professional software engineering or quality engineering experience.

* Hands-on experience working with LLMs or agentic AI systems (e.g., GPT-4, Claude, Gemini, or open-source models).

* Proficiency in Python for scripting, test automation, and data analysis.

* Experience designing and running evaluations (evals) for generative AI or LLM-powered features.

* Solid understanding of software testing principles: unit, integration, regression, and end-to-end testing.

* Familiarity with agentic frameworks and concepts (e.g., tool use, multi-step reasoning, retrieval-augmented generation, memory).

* Experience with CI/CD pipelines and integrating automated tests into development workflows.

* Strong analytical skills - able to interpret probabilistic outputs and distinguish meaningful regressions from expected variance.

* Strong written and verbal communication skills; ability to clearly document findings and present quality data to non-technical stakeholders.

* Detail-oriented, with a structured approach to exploring edge cases and failure scenarios.

* Ability to work in a fast-paced environment and manage multiple priorities effectively.

Nice to Have

* Experience with prompt engineering and systematic prompt evaluation methodologies.

* Familiarity with AI safety, alignment, or responsible AI concepts (e.g., hallucination mitigation, bias detection, guardrails).

* Exposure to agentic orchestration frameworks (e.g., Lang Chain, Lang Graph, Auto Gen, CrewAI, or similar).

* Experience with vector databases or RAG pipelines (e.g., Pinecone, Weaviate, pgvector).

* Knowledge of observability and monitoring tools for AI systems (e.g., Lang Smith, Weights & Biases, Arize).

* Background in data science or ML experimentation practices.

* Experience with version control systems (Git) and defect-tracking tools (e.g., Jira).

* Exposure to cloud platforms (e.g., AWS, Azure, GCP) in the context of deploying or testing AI services.

What Success Looks Like

* Builds robust eval frameworks that catch meaningful regressions in AI behavior before they reach production.

* Reduces time-to-detection for quality issues in agentic workflows through effective automation and monitoring.

* Contributes clear, actionable quality signals that help the team make confident release decisions.

Grows into a trusted voice on AI quality standards, influencing engineering practices across the team.

#LI-MH1 #momentivesoftware

About Us

Momentive Software amplifies the impact of over 20,000 purpose-driven organizations in over 30 countries, with over $11 billion raised and 55 million members served to date. Mission-driven nonprofits and associations rely on Momentive's cloud-based software and services to address their most pressing challenges - from engaging their communities to simplifying operations and growing revenue. Designed to help organizations connect more, manage more, and ultimately expect more,…