×
Register Here to Apply for Jobs or Post Jobs. X
More jobs:

AI Evaluation Scientist – LLM Research

Job in Toronto, Ontario, C6A, Canada
Listing for: Cohere
Full Time position
Listed on 2026-06-07
Job specializations:
  • Research/Development
    Data Scientist
Job Description & How to Apply Below
Join the forefront of AI with a Senior Research Scientist focusing on model evaluation methods. This innovative position centers on developing prototypes to measure LLM capabilities accurately.
This role is vital for advancing the evaluation techniques needed as AI models approach superhuman performance. You will be responsible for setting ambitious benchmarks and creating infrastructure that accurately assesses LLM performance. Your strong software engineering skills will be key as you engage with cross-functional teams to deliver reliable, repeatable evaluations.

Key Responsibilities:

• Innovate evaluation methods for large language models

• Establish benchmarks that challenge model capacities

• Collaborate closely with teams on evaluation metrics

• Conduct research to refine evaluation efficiency

• Build tools for in-depth analysis of model outputs
Requirements:

• Proficient in software engineering skills

• Experience analyzing complex LLM data

• Strong focus on measurement alignment and rigor

• Capability to develop prototypes rapidly

• Open to diverse applications even without perfect alignment
Become a leader in AI evaluation by developing techniques that shape the future of intelligent models.
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary