AI/LLM Engineer Job Tampa area,Florida USA,IT/Tech

Join to apply for the AI/LLM Engineer role at Tential Solutions
.

Senior SDET – AI / LLM Quality Engineering (Shared Services) About The Team

This role sits within the QA Center of Excellence
, part of a small, highly specialized AI Quality Engineering team: two SDETs and one Data Engineer. The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.

Role Overview

We are seeking a Senior Software Development Engineer in Test (SDET) with strong automation and systems‑testing background to focus on LLM quality, validation, and evaluation.

In This Role, You Will

Test LLM‑powered applications used across the enterprise
Build LLM‑driven testing and evaluation workflows
Define organization‑wide standards for GenAI quality and reliability

Key Responsibilities LLM Testing & Evaluation

Design and implement test strategies for LLM‑powered systems, including:
- Prompt and response validation
- Regression testing across model, prompt, and data changes
- Evaluation of accuracy, consistency, hallucinations, and safety
Build and maintain LLM‑based evaluation frameworks using tools such as Deep Eval, MLflow, Langflow, and Lang Chain
Develop synthetic and real‑world test datasets in partnership with the Data Engineer
Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems

Test Automation & Framework Development

Build and maintain automated test frameworks for:
- LLM APIs and services
- Agentic and RAG workflows
- Data and inference pipelines
Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release
Partner with engineering teams to improve testability and reliability of AI systems
Perform root‑cause analysis of failures related to model behavior, data quality, or orchestration logic

Observability & Monitoring

Instrument LLM applications with Datadog LLM Observability to monitor:
- Latency, token usage, errors, and cost
- Quality regressions and performance anomalies
Build dashboards and alerts focused on LLM quality, reliability, and drift
Use production telemetry to continuously refine test coverage and evaluation strategies

Shared Services & Collaboration

Act as a consultative partner to product, platform, and data teams adopting LLM technologies
Provide guidance on:
- Test strategies for generative AI
- Prompt and workflow validation
- Release readiness and risk assessment
Contribute to organization‑wide standards and best practices for explaining, testing, and monitoring AI systems
Participate in design and architecture reviews from a quality‑first perspective

Engineering Excellence

Advocate for automation‑first testing, infrastructure as code, and continuous monitoring
Drive adoption of Agile, Dev Ops, and CI/CD best practices within the AI quality space
Conduct code reviews and promote secure, maintainable test frameworks
Continuously improve internal tooling and frameworks used by the QA Center of Excellence

Required Skills & Experience Core SDET Experience

5+ years of experience in SDET, test automation, or quality engineering roles
Strong Python development skills
Experience testing backend systems, APIs, or distributed platforms
Proven experience building and maintaining automation frameworks
Comfort working with ambiguous, non‑deterministic systems

AI / LLM Experience

Hands‑on experience testing or validating ML‑ or LLM‑based systems
Familiarity with LLM orchestration and evaluation tools such as:
- Langflow, Lang Chain
- Deep Eval, MLflow
Understanding of challenges unique to testing generative AI systems

Nice to Have

Experience with Datadog (especially LLM Observability)
Exposure to Hugging Face, PyTorch, or Tensor Flow (usage‑level)
Experience testing RAG pipelines, Vector

DBs, or data‑driven platforms
Background working in platform, shared services, or Center of Excellence teams
Experience collaborating closely with data engineering or ML platform teams

What This Role Is Not

Not a pure ML research or model training role
Not a feature‑focused backend engineering role
Not manual QA

Why This Role Is Unique

You will define how AI quality is measured across the organization
You will build LLM‑powered testing systems, not just test scripts
You will influence multiple teams and products, not just one codebase
You will work at the intersection of AI, automation, and reliability

#Remote#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language

AI​/LLM Engineer

AI/LLM Engineer