×
Register Here to Apply for Jobs or Post Jobs. X

AI​/LLM Engineer

Job in Tampa, Hillsborough County, Florida, 33646, USA
Listing for: Tential Solutions
Full Time position
Listed on 2025-12-18
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below

Join to apply for the AI/LLM Engineer role at Tential Solutions
.

Senior SDET – AI / LLM Quality Engineering (Shared Services) About The Team

This role sits within the QA Center of Excellence
, part of a small, highly specialized AI Quality Engineering team: two SDETs and one Data Engineer. The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.

Role Overview

We are seeking a Senior Software Development Engineer in Test (SDET) with strong automation and systems‑testing background to focus on LLM quality, validation, and evaluation.

In This Role, You Will
  • Test LLM‑powered applications used across the enterprise
  • Build LLM‑driven testing and evaluation workflows
  • Define organization‑wide standards for GenAI quality and reliability
Key Responsibilities LLM Testing & Evaluation
  • Design and implement test strategies for LLM‑powered systems, including:
    • Prompt and response validation
    • Regression testing across model, prompt, and data changes
    • Evaluation of accuracy, consistency, hallucinations, and safety
  • Build and maintain LLM‑based evaluation frameworks using tools such as Deep Eval, MLflow, Langflow, and Lang Chain
  • Develop synthetic and real‑world test datasets in partnership with the Data Engineer
  • Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems
Test Automation & Framework Development
  • Build and maintain automated test frameworks for:
    • LLM APIs and services
    • Agentic and RAG workflows
    • Data and inference pipelines
  • Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release
  • Partner with engineering teams to improve testability and reliability of AI systems
  • Perform root‑cause analysis of failures related to model behavior, data quality, or orchestration logic
Observability & Monitoring
  • Instrument LLM applications with Datadog LLM Observability to monitor:
    • Latency, token usage, errors, and cost
    • Quality regressions and performance anomalies
  • Build dashboards and alerts focused on LLM quality, reliability, and drift
  • Use production telemetry to continuously refine test coverage and evaluation strategies
Shared Services & Collaboration
  • Act as a consultative partner to product, platform, and data teams adopting LLM technologies
  • Provide guidance on:
    • Test strategies for generative AI
    • Prompt and workflow validation
    • Release readiness and risk assessment
  • Contribute to organization‑wide standards and best practices for explaining, testing, and monitoring AI systems
  • Participate in design and architecture reviews from a quality‑first perspective
Engineering Excellence
  • Advocate for automation‑first testing, infrastructure as code, and continuous monitoring
  • Drive adoption of Agile, Dev Ops, and CI/CD best practices within the AI quality space
  • Conduct code reviews and promote secure, maintainable test frameworks
  • Continuously improve internal tooling and frameworks used by the QA Center of Excellence
Required Skills & Experience Core SDET Experience
  • 5+ years of experience in SDET, test automation, or quality engineering roles
  • Strong Python development skills
  • Experience testing backend systems, APIs, or distributed platforms
  • Proven experience building and maintaining automation frameworks
  • Comfort working with ambiguous, non‑deterministic systems
AI / LLM Experience
  • Hands‑on experience testing or validating ML‑ or LLM‑based systems
  • Familiarity with LLM orchestration and evaluation tools such as:
    • Langflow, Lang Chain
    • Deep Eval, MLflow
  • Understanding of challenges unique to testing generative AI systems
Nice to Have
  • Experience with Datadog (especially LLM Observability)
  • Exposure to Hugging Face, PyTorch, or Tensor Flow (usage‑level)
  • Experience testing RAG pipelines, Vector

    DBs, or data‑driven platforms
  • Background working in platform, shared services, or Center of Excellence teams
  • Experience collaborating closely with data engineering or ML platform teams
What This Role Is Not
  • Not a pure ML research or model training role
  • Not a feature‑focused backend engineering role
  • Not manual QA
Why This Role Is Unique
  • You will define how AI quality is measured across the organization
  • You will build LLM‑powered testing systems, not just test scripts
  • You will influence multiple teams and products, not just one codebase
  • You will work at the intersection of AI, automation, and reliability
#Remote#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary