More jobs:
AI/LLM Engineer
Job in
Tampa, Hillsborough County, Florida, 33646, USA
Listed on 2025-12-18
Listing for:
Tential Solutions
Full Time
position Listed on 2025-12-18
Job specializations:
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
Join to apply for the AI/LLM Engineer role at Tential Solutions
.
This role sits within the QA Center of Excellence
, part of a small, highly specialized AI Quality Engineering team: two SDETs and one Data Engineer. The team operates as a shared service across the organization, defining how Large Language Model (LLM)–powered systems are tested, evaluated, observed, and trusted before and after production release.
We are seeking a Senior Software Development Engineer in Test (SDET) with strong automation and systems‑testing background to focus on LLM quality, validation, and evaluation.
In This Role, You Will- Test LLM‑powered applications used across the enterprise
- Build LLM‑driven testing and evaluation workflows
- Define organization‑wide standards for GenAI quality and reliability
- Design and implement test strategies for LLM‑powered systems, including:
- Prompt and response validation
- Regression testing across model, prompt, and data changes
- Evaluation of accuracy, consistency, hallucinations, and safety
- Build and maintain LLM‑based evaluation frameworks using tools such as Deep Eval, MLflow, Langflow, and Lang Chain
- Develop synthetic and real‑world test datasets in partnership with the Data Engineer
- Define quality thresholds, scoring mechanisms, and pass/fail criteria for GenAI systems
- Build and maintain automated test frameworks for:
- LLM APIs and services
- Agentic and RAG workflows
- Data and inference pipelines
- Integrate testing and evaluation into CI/CD pipelines, enforcing quality gates before production release
- Partner with engineering teams to improve testability and reliability of AI systems
- Perform root‑cause analysis of failures related to model behavior, data quality, or orchestration logic
- Instrument LLM applications with Datadog LLM Observability to monitor:
- Latency, token usage, errors, and cost
- Quality regressions and performance anomalies
- Build dashboards and alerts focused on LLM quality, reliability, and drift
- Use production telemetry to continuously refine test coverage and evaluation strategies
- Act as a consultative partner to product, platform, and data teams adopting LLM technologies
- Provide guidance on:
- Test strategies for generative AI
- Prompt and workflow validation
- Release readiness and risk assessment
- Contribute to organization‑wide standards and best practices for explaining, testing, and monitoring AI systems
- Participate in design and architecture reviews from a quality‑first perspective
- Advocate for automation‑first testing, infrastructure as code, and continuous monitoring
- Drive adoption of Agile, Dev Ops, and CI/CD best practices within the AI quality space
- Conduct code reviews and promote secure, maintainable test frameworks
- Continuously improve internal tooling and frameworks used by the QA Center of Excellence
- 5+ years of experience in SDET, test automation, or quality engineering roles
- Strong Python development skills
- Experience testing backend systems, APIs, or distributed platforms
- Proven experience building and maintaining automation frameworks
- Comfort working with ambiguous, non‑deterministic systems
- Hands‑on experience testing or validating ML‑ or LLM‑based systems
- Familiarity with LLM orchestration and evaluation tools such as:
- Langflow, Lang Chain
- Deep Eval, MLflow
- Understanding of challenges unique to testing generative AI systems
- Experience with Datadog (especially LLM Observability)
- Exposure to Hugging Face, PyTorch, or Tensor Flow (usage‑level)
- Experience testing RAG pipelines, Vector
DBs, or data‑driven platforms - Background working in platform, shared services, or Center of Excellence teams
- Experience collaborating closely with data engineering or ML platform teams
- Not a pure ML research or model training role
- Not a feature‑focused backend engineering role
- Not manual QA
- You will define how AI quality is measured across the organization
- You will build LLM‑powered testing systems, not just test scripts
- You will influence multiple teams and products, not just one codebase
- You will work at the intersection of AI, automation, and reliability
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×