Applied Scientist
6300, Zug, Kanton Zug, Switzerland
Listed on 2025-12-15
-
IT/Tech
AI Engineer, Data Scientist, Machine Learning/ ML Engineer
Are you passionate about advancing the science of evaluating large language models and intelligent agents? Join Thomson Reuters Labs
, where we experiment, build, and deliver cutting‑edge AI systems that empower professionals worldwide.
Our flagship AI assistant,
Co Counsel , helps legal, tax, and business professionals work smarter. We’re expanding our LLM Evaluation team
, focused on developing automated, scalable, and trustworthy evaluation frameworks that measure model reasoning, reliability, and alignment.
At Thomson Reuters Labs
, we blend applied research with real‑world impact. Our scientists work on projects spanning LLM reasoning, benchmarking, grounding, and agentic behavior
—all aimed at ensuring our AI systems are effective, explainable, and robust.
We believe that rigorous evaluation is the foundation of responsible AI.
This role offers the opportunity to push the boundaries of auto‑evaluation, LLM‑as‑a‑judge
, and agentic evaluation methodologies
, influencing how AI systems are measured and improved at scale.
Design and Conduct Evaluations: Develop and execute evaluation pipelines for LLMs and agentic systems, assessing reasoning, factual accuracy, and alignment.
Automate and Scale: Build tools and frameworks for automatic evaluation
, including synthetic dataset creation, LLM‑as‑a‑judge workflows, and continuous benchmarking systems.Collaborate and Translate: Partner with applied scientists, ML engineers, and product managers to translate evaluation results into model improvements and product insights.
Research and Experiment: Prototype new evaluation metrics, contribute to internal reports, and support publications or presentations on evaluation methods.
Champion Best Practices: Promote reproducibility, transparency, and ethical AI evaluation within the team and broader organization.
PhD in Computer Science, Artificial Intelligence, Machine Learning, or a related field (exceptional Master’s candidates with equivalent experience will be considered).
Research or hands‑on experience with large language models, NLP evaluation, or agent‑based AI systems
.Strong understanding of LLM performance measurement
, prompt evaluation, and reliability testing.Proficiency in Python and familiarity with ML libraries such as Py Torch ,
Transformers
, and Lang Chain
.Comfort with experimental design, data analysis, and communicating technical findings clearly.
Experience with
LLM evaluation frameworks (e.g., OpenAI Evals, HELM, LM Harness, or custom auto‑eval tools).Familiarity with retrieval‑augmented generation (RAG),
tool‑using agents
, or agentic evaluation methodologies.Experience in cloud‑based ML development (AWS, Azure, or GCP).
Record of publications or preprints in top‑tier venues (e.g., NeurIPS, ACL, EMNLP, ICLR) or equivalent research contributions.
Interest in Responsible AI
, fairness, and interpretability research.
Hybrid Work Model: We’ve adopted a flexible hybrid working environment (2‑3 days a week in the office depending on the role) for our office‑based roles while delivering a seamless experience that is digitally and physically connected.
Flexibility & Work‑Life Balance: Flex My Way is a set of supportive workplace policies designed to help manage personal and professional responsibilities, whether caring for family, giving back to the community, or finding time to refresh and reset. This builds upon our flexible work arrangements, including work from anywhere for up to 8 weeks per year, empowering employees to achieve a better work‑life balance.
Career Development and Growth: By fostering a culture of continuous learning and skill development, we prepare our talent to tackle tomorrow’s challenges and deliver real‑world solutions. Our Grow My Way programming and skills‑first approach ensures you have the tools and knowledge to grow, lead, and thrive in an AI‑enabled future.
Industry Competitive Benefits: We offer comprehensive benefit plans to include flexible vacation, two company‑wide Mental Health Days off, access to the Headspace app, retirement savings, tuition reimbursement, employee incentive programs, and…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: