AI Quality Engineer
Job in
Atlanta, Fulton County, Georgia, 30301, USA
Listed on 2026-06-02
Listing for:
Abila
Full Time
position Listed on 2026-06-02
Job specializations:
-
Software Development
AI Engineer, Machine Learning/ ML Engineer, Data Scientist
Job Description & How to Apply Below
Key Responsibilities
* Design and implement evaluation frameworks (evals) to assess LLM and agentic AI system quality, including accuracy, consistency, safety, and task completion rates.
* Build and maintain automated test pipelines for AI features, covering unit, integration, and end-to-end scenarios across agentic workflows.
* Develop tooling to detect regressions in model behavior, prompt outputs, and agent decision-making across releases.
* Define and track quality metrics for AI systems (e.g., hallucination rates, tool-use accuracy, latency, failure recovery) and surface findings clearly to stakeholders.
* Collaborate with engineers and product managers to identify edge cases, adversarial inputs, and failure modes specific to multi-step agentic pipelines.
* Contribute to prompt evaluation strategies, including red-teaming, adversarial testing, and bias/fairness assessments.
* Participate in design and code reviews with a quality-focused lens, raising concerns about testability and reliability early.
* Help define and document quality standards and best practices for AI/ML features across the team.
* Other duties as assigned.
Qualifications
Required
* Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
* 3-5 years of professional software engineering or quality engineering experience.
* Hands-on experience working with LLMs or agentic AI systems (e.g., GPT-4, Claude, Gemini, or open-source models).
* Proficiency in Python for scripting, test automation, and data analysis.
* Experience designing and running evaluations (evals) for generative AI or LLM-powered features.
* Solid understanding of software testing principles: unit, integration, regression, and end-to-end testing.
* Familiarity with agentic frameworks and concepts (e.g., tool use, multi-step reasoning, retrieval-augmented generation, memory).
* Experience with CI/CD pipelines and integrating automated tests into development workflows.
* Strong analytical skills - able to interpret probabilistic outputs and distinguish meaningful regressions from expected variance.
* Strong written and verbal communication skills; ability to clearly document findings and present quality data to non-technical stakeholders.
* Detail-oriented, with a structured approach to exploring edge cases and failure scenarios.
* Ability to work in a fast-paced environment and manage multiple priorities effectively.
Nice to Have
* Experience with prompt engineering and systematic prompt evaluation methodologies.
* Familiarity with AI safety, alignment, or responsible AI concepts (e.g., hallucination mitigation, bias detection, guardrails).
* Exposure to agentic orchestration frameworks (e.g., Lang Chain, Lang Graph, Auto Gen, CrewAI, or similar).
* Experience with vector databases or RAG pipelines (e.g., Pinecone, Weaviate, pgvector).
* Knowledge of observability and monitoring tools for AI systems (e.g., Lang Smith, Weights & Biases, Arize).
* Background in data science or ML experimentation practices.
* Experience with version control systems (Git) and defect-tracking tools (e.g., Jira).
* Exposure to cloud platforms (e.g., AWS, Azure, GCP) in the context of deploying or testing AI services.
What Success Looks Like
* Builds robust eval frameworks that catch meaningful regressions in AI behavior before they reach production.
* Reduces time-to-detection for quality issues in agentic workflows through effective automation and monitoring.
* Contributes clear, actionable quality signals that help the team make confident release decisions.
Grows into a trusted voice on AI quality standards, influencing engineering practices across the team.
#LI-MH1 #momentivesoftware
About Us
Momentive Software amplifies the impact of over 20,000 purpose-driven organizations in over 30 countries, with over $11 billion raised and 55 million members served to date. Mission-driven nonprofits and associations rely on Momentive's cloud-based software and services to address their most pressing challenges - from engaging their communities to simplifying operations and growing revenue. Designed to help organizations connect more, manage more, and ultimately expect more,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×