Engineer - ML & RL Job Edmonton area,Alberta Canada,IT/Tech

Job description

Huawei Canada has an immediate 12-month contract opening for an Engineer.

About the team:

The Software-Hardware System Optimization Lab continuously improves the power efficiency and performance of smartphone products through software-hardware systems optimization and architecture innovation. We keep tracking the trends of cutting-edge technologies, building the competitive strength of mobile AI, graphics, multimedia, and software architecture for mobile phone products.

About the job:

Design and build scalable infrastructure to support Reinforcement Learning, Online Search, Recommendation Systems, large model fine-tuning and evaluation/deployment.

Develop efficient ML solutions for Recommendation Systems and RL problems, including Multi-Armed and Contextual Bandit, Tree Search, and Multi-Agent system orchestration.

Implement and optimize deep learning architectures, including custom Transformers for agentic and decision-making systems.

Apply search and optimization techniques to efficiently fine-tune RL and ML models.

Work with large multimodal models (LLMs, VLMs), analyze their components, and fine-tune them for task-specific applications.

Conduct systematic benchmarking, new papers reading, experimentation, and validation in both simulation and real-world product environments.

Collaborate closely with research teams to scale online RL training capabilities and improve system robustness and accuracy.

Explore and integrate emerging AI methodologies and tools into production platforms.

Job requirements

About the ideal candidate:

Master’s or PhD in Computer Science, Machine Learning, or a related field.

Excellent Python programming skills with strong software engineering practices.

Strong foundation in Reinforcement Learning, Deep Learning, Recommender Systems, and Transformer-based architectures.

Demonstrated experience implementing RL algorithms beyond academic prototypes.

Hands-on experience with PyTorch and distributed training frameworks such as Deep Speed.

Proven research excellence, including at least one publication in top-tier venues (e.g., NeurIPS, ICML, ICLR, CVPR, ICCV, ECCV, ICRA, RLC).

Familiarity with LLM post-training techniques such as RLHF, PPO/GRPO, SFT, LoRA, or MoE is an asset.

Experience with multi-agent RL systems or tool-use agents is an asset.

Additional Information:

Huawei Canada is committed to a fair, inclusive, and accessible recruitment process. If you require accommodation during any stage of the hiring process, please let us know and we will work with you to meet your needs.

All applications for this position are reviewed directly by our hiring team,
we do not use artificial intelligence tools to screen or select candidates.