×
Register Here to Apply for Jobs or Post Jobs. X

Senior AI Scientist

Job in Durham, Durham County, North Carolina, 27703, USA
Listing for: IQVIA, Inc.
Full Time position
Listed on 2026-06-05
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Data Scientist, Machine Learning/ ML Engineer
Job Description & How to Apply Below

Role Overview


We are seeking a Senior AI Scientist to lead the design, development, and operationalization of evaluation frameworks for Generative AI systems, with a primary focus on Large Language Models (LLMs) and agentic AI solutions.

This role will be responsible for defining and implementing robust methods to assess quality, safety, reliability, and business impact across LLM-powered applications and multi-agent workflows. The position operates within regulated environments such as life sciences, clinical research, and regulatory domains, ensuring that AI systems meet enterprise and compliance standards.

Key Responsibilities

1. LLM Evaluation & Benchmarking
  • Design and implement scalable evaluation frameworks for LLMs across use cases including:
    • Question answering, summarization, information extraction, and reasoning
    • Clinical and regulatory document generation (e.g., ICFs, CSRs, protocols)
  • Develop both automated and human-in-the-loop evaluation pipelines
  • Define, measure, and monitor key performance metrics, including:
    • Accuracy, factuality, faithfulness, and hallucination rate
    • Robustness, consistency, latency, and cost-performance trade-offs
  • Build domain-specific benchmarks using real-world clinical, regulatory, and RWD data
2. Agentic AI & Multi-Agent Evaluation
  • Establish evaluation strategies for agent-based and multi-agent systems
  • Measure and analyze:
    • Task completion success rates
    • Planning and reasoning quality
    • Tool usage accuracy
    • Inter-agent coordination and failure patterns
  • Develop scenario-based and simulation-driven evaluation environments
  • Evaluate orchestration frameworks (e.g., Lang Graph, Semantic Kernel, Claude Agents)
3. End-to-End System Evaluation
  • Define evaluation strategies for complete AI pipelines, including:
    • Retrieval-Augmented Generation (RAG) systems
    • Tool-augmented agents
    • Knowledge graph + LLM architectures
  • Implement offline and online evaluation mechanisms, such as:
    • A/B testing and canary releases
    • Production monitoring and model drift detection
  • Enable observability and traceability using tools such as Lang Smith and Open Telemetry
4. Responsible AI & Compliance
  • Ensure all evaluation practices align with Responsible AI principles and regulatory requirements (e.g., GxP)
  • Assess and mitigate risks related to:
    • Bias, fairness, safety, and explainability
    • Data leakage and privacy concerns
  • Develop audit-ready evaluation frameworks suitable for regulated environments (e.g., FDA, EMA)
5. Tooling & Platform Development
  • Build and scale evaluation tooling, including:
    • Automated evaluation pipelines
    • Prompt and version tracking systems
    • Experiment management platforms
  • Integrate evaluation frameworks with enterprise AI platforms (e.g., Azure, Databricks, AWS, on-prem GPU environments)
6. Leadership & Collaboration
  • Collaborate with cross-functional teams including:
    • AI/ML engineers, product teams, and domain scientists
    • Clinical, regulatory, and real-world evidence stakeholders
  • Establish enterprise-wide evaluation standards and best practices
  • Mentor junior team members and contribute to strategic AI initiatives

Required Qualifications
  • Master's or PhD in Computer Science, Artificial Intelligence, Machine Learning, or a related field
  • 5+ years of experience in applied AI/ML, with a strong focus on Generative AI and LLMs
  • Demonstrated experience in:
    • LLM evaluation (both automated and human-in-the-loop)
    • Prompt engineering and model behavior analysis
    • Python programming using frameworks such as PyTorch or Tensor Flow
  • Hands-on experience with:
    • RAG systems, embeddings, and vector databases
    • Agent frameworks (e.g., Lang Chain, Lang Graph, Semantic Kernel)
  • Strong understanding of:
    • Evaluation metrics and experimental design
    • Model limitations, failure modes, and debugging techniques

Preferred Qualifications
  • Experience in life sciences, healthcare, or regulated environments
  • Familiarity with:
    • Clinical trial workflows (ICF, CSR, TMF, regulatory submissions)
    • Knowledge graphs and biomedical data systems
  • Experience with evaluation tools such as Lang Smith, Promptfoo, Deep Eval, HELM, or OpenAI Evals
  • Exposure to Responsible AI frameworks and regulatory compliance standards

What Success Looks Like
  • Standardized evaluation frameworks adopted across AI teams
  • Measurable…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary