×
Register Here to Apply for Jobs or Post Jobs. X

Researcher - Reinforcement Learning

Job in Markham, Ontario, Canada
Listing for: Huawei Canada
Contract position
Listed on 2025-12-30
Job specializations:
  • IT/Tech
    Data Scientist, Machine Learning/ ML Engineer, AI Engineer, Artificial Intelligence
Salary/Wage Range or Industry Benchmark: 30000 - 60000 CAD Yearly CAD 30000.00 60000.00 YEAR
Job Description & How to Apply Below

Join to apply for the Researcher - Reinforcement Learning role at Huawei Canada

Huawei Canada has an immediate 12-month contract opening for a Reinforcement Learning Researcher.

About the team

Founded in 2012, the Noah’s Ark lab has evolved into a prominent research organization with notable achievements in academia and industry. The lab’s mission focuses on advancing artificial intelligence and related fields to benefit the company and society. Driven by impactful, long‑term projects, the aim is to enhance state‑of‑the‑art research while integrating innovations into the company’s products and services, including LLMs, RL, NLP, computer vision, AI theory, and Autonomous driving.

About

the job

Enabling Large Language Models (LLMs) to learn from experience, interaction, and environment feedback, moving beyond static fine‑tuning toward continual, agentic self‑improvement.

LLM post‑training paradigms (e.g., RLHF, GRPO, reward‑free methods, etc.).

Agentic reinforcement learning for tool‑using and browsing‑based LLMs trained in interactive environments.

Agentic evaluation and benchmarking, including design of multi‑turn, verifiable reasoning tasks.

Your work will involve implementing and evaluating new training and evaluation pipelines for reasoning‑enhanced LLMs and tool‑using agents, scaling experiments on large GPU clusters, and contributing to scientific insights and publications in this emerging area.

About the ideal candidate
  • PhD degree in Computer Science or related fields or master’s degree with comparable experience.
  • Strong foundation in deep learning, including architectures such as Transformers and optimization techniques for large models.
  • Practical or research experience in reinforcement learning, self‑supervised learning, or language model fine‑tuning.
  • Proven research record in AI by having at least one paper as the first author in top tier venues, such as NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ICRA.
  • Solid proficiency in Python and experience with PyTorch, Deep Speed, Megatron and other distributed training frameworks.
  • Familiarity with LLM post‑training pipelines (RLHF, GRPO/PPO, SFT, LoRA, MoE, etc.) is an asset.
  • Experience with multi‑agent RL, tool‑use / browser/coding agents, is an asset.
  • Strong communication and writing skills; enthusiasm for open research and collaborative problem‑solving.
Seniority Level

Entry level

Employment type

Contract

Job function / Industries

Human Resources / Telecommunications

#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary