×
Register Here to Apply for Jobs or Post Jobs. X

Computing Software Engineer - Supercomputing

Job in Abu Dhabi, UAE/Dubai
Listing for: Institute of Foundation Models
Full Time position
Listed on 2025-12-10
Job specializations:
  • Software Development
    AI Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 120000 - 200000 AED Yearly AED 120000.00 200000.00 YEAR
Job Description & How to Apply Below
Position: High Performance Computing Software Engineer - Supercomputing

About the Institute of Foundation Models

We are a dedicated research lab for building, understanding, using, and risk‑managing foundation models. Our mandate is to advance research, nurture the next generation of AI builders, and drive transformative contributions to a knowledge‑driven economy.

As part of our team, you’ll have the opportunity to work on the core of cutting‑edge foundation model training alongside world‑class researchers, data scientists, and engineers, tackling the most fundamental and impactful challenges in AI development. You will participate in the development of groundbreaking AI solutions that have the potential to reshape entire industries. Strategic and innovative problem‑solving skills will be instrumental in establishing MBZUAI as a global hub for high‑performance computing in deep learning, driving impactful discoveries that inspire the next generation of AI pioneers.

The Role

IFM is building the foundational compute infrastructure that will power tomorrow’s breakthroughs in AI and computational science. We’re looking for a High Performance Computing Software Engineer to help us design, develop, and operate the software systems that run our large‑scale AI workloads.

In this role, you’ll work at the intersection of high‑performance computing and machine learning. You’ll be part of a team responsible for crafting the software stack that enables training of cutting‑edge ML models—spanning 1000+ GPUs—and ensuring our infrastructure is robust, performant, and developer‑friendly.

Job Responsibilities
  • Design and implement high‑performance, distributed software solutions for large‑scale AI/ML training.
  • Optimize low‑level system components including Linux kernel, GPU/accelerator kernels, and interconnects.
  • Develop and tune communication libraries such as NCCL, MPI, UCX, RCCL, and RDMA‑based systems.
  • Partner with ML researchers and engineers to support frameworks like PyTorch, Megatron

    LM, and Deep Speed in large‑scale production environments.
  • Contribute to our scheduling, orchestration, and job management systems, including Slurm and Kubernetes.
  • Debug and resolve complex issues across the stack—from kernel to container to model.
  • Work closely with hardware vendors, upstream open‑source communities, and internal teams to drive performance and reliability improvements.
Skills & Experience
  • Proven experience developing and optimizing software for large‑scale ML workloads (1000+ GPUs preferred).
  • Deep understanding of Linux kernel internals and accelerator (GPU) kernel development.
  • Proficiency with distributed communication libraries (e.g., NCCL, RCCL, MPI, UCX, SHARP, Libfabric).
  • Experience with ML frameworks like PyTorch, Tensor Flow, JAX, or Megatron

    LM.
  • Strong knowledge of HPC job scheduling and orchestration tools (e.g., Slurm, Kubernetes, Pyxis).
  • Excellent debugging and systems performance tuning skills.
  • A collaborative mindset with a focus on shared success and technical excellence.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary