Research Engineer/Scientist - Machine Learning RL & Optimisation; Contractor Job London area,Greater London England UK,IT/Tech

Position: Research Engineer/Scientist - Machine Learning RL & Optimisation (Contractor)
Location: Greater London

About Huawei Research and Development UK Limited

Founded in 1987, Huawei is a leading global provider of information and communications technology (ICT) infrastructure and smart devices. We have 207,000 employees and operate in over 170 countries and regions, serving more than three billion people around the world.

Our vision and mission is to bring digital to every person, home and organization for a fully connected, intelligent world. To this end, we will drive ubiquitous connectivity and promote equal access to networks; bring cloud and artificial intelligence to all four corners of the earth to provide superior computing power where you need it, when you need it; build digital platforms to help all industries and organizations become more agile, efficient, and dynamic;

redefine user experience with AI, making it more personalized for people in all aspects of their life, whether they’re at home, in the office, or on the go.

This spirit of innovation has led Huawei to work in close partnership with leading academic institutions in the UK to develop and refine the latest technologies. With a shared commitment to innovation and progress, both parties have worked together to achieve common goals and establish a strong partnership. The partnership between UK and Huawei help to develop the technologies of the future that will transform the way we all communicate, work and live.

For the past 30 years we have maintained an unwavering focus, rejecting shortcuts and easy opportunities that don't align with our core business. With a practical approach to everything we do, we concentrate our efforts and invest patiently to drive technological breakthroughs.

This strategic focus is a reflection of our core values:

Staying customer-centric,
Inspiring dedication,
Persevering,
Growing by reflection.

Huawei Research and Development UK Limited Overview

Huawei’s vision is a fully connected, intelligent world. To achieve this, we work to inspire passion for basic research around the world. Our combined passion drives development across the global innovation value chain. Huawei has the largest Research and Development organization in the world with 96,000+ employees in research centers around the globe. In the UK, we already have design centers in Cambridge, London, Edinburgh and Ipswich.

We continue to explore and define new research directions and new services. We have expanded our collaborations with academic researchers; researched new network architectures, integration of communications and key enabling technologies; and developed the fundamental theories of these technologies. We invite you to join us on this exciting journey and drive your career forward.

Job Summary

Research and develop large-scale machine learning systems, alignment workflows, and optimization infrastructure to advance LLM reasoning and post-training capabilities. Design and execute scaled reinforcement learning pipelines (e.g., PPO, GRPO) utilizing distributed training frameworks (verl, trl, Deep Speed, FSDP) integrated with high-performance inference engines (vLLM). Optimize low-level training throughput, kernel performance, and memory utilization across heterogeneous hardware clusters using expressive hardware DSLs (e.g., Tile Lang, Triton).

Advance the LLM orchestration loop and leverage Bayesian optimization to automate the search, generation, and continuous improvement of high-performance NPU kernels.

Key Responsibilities

Design and execute scaled RL fine tuning workflows (e.g., PPO, GRPO) to enhance LLM reasoning, instruction-following, and alignment.
Architect and manage large-scale distributed training experiments across multi-node GPU, optimizing for maximum throughput and hardware utilization.
Develop and maintain training infrastructure using advanced parallelization frameworks (verl, trl, Deep Speed, FSDP) to support rapidly evolving research needs.
Integrate high-performance inference engines like vLLM directly into RL generation loops to reduce rollout latency and accelerate training cycles.
Implement robust profiling and debugging pipelines to diagnose bottlenecks in GPU memory, compute, and inter-node communication.
Collaborate with…

Research Engineer​/Scientist - Machine Learning RL & Optimisation; Contractor

Research Engineer/Scientist - Machine Learning RL & Optimisation; Contractor