Consultant; HPC Job Dubai area,Dubai UAE/Dubai,IT/Tech

Position: Consultant (HPC)

HPC Engineer

Location: Riyadh, Saudi Arabia

Seniority Level: Mid-Senior level

Employment Type: Contract

Job Description

Architect, deploy, and optimise large‑scale, performance‑critical HPC environments that support tightly coupled MPI workloads, GPU‑accelerated applications, and data‑intensive simulations. Focus on extracting maximum performance from compute, network, and storage layers through low‑level tuning, benchmarking, and continuous optimisation.

Responsibilities

Design, architect, and operate HPC clusters of 100–500+ nodes, including CPU‑only and GPU‑accelerated systems optimized for tightly coupled parallel workloads.
Design and tune MPI‑based environments (OpenMPI, Intel MPI, MPICH), including process pinning, CPU affinity, memory binding, and topology‑aware scheduling.
Optimize NUMA architectures, huge pages, CPU isolation, BIOS/firmware settings, and kernel parameters for latency‑ and throughput‑sensitive workloads.
Deploy and tune high‑speed interconnects (Infini Band / RDMA / RoCE), including fabric configuration, QoS, congestion control, and performance validation.
Configure and operate GPU‑accelerated HPC systems, including CUDA‑aware MPI, NCCL, GPUDirect RDMA, and multi‑GPU/NVLink topologies.
Manage and tune job schedulers (Slurm, PBS, LSF) with advanced configurations such as topology‑aware scheduling, GPU binding, fair‑share policies, and preemption.
Design and optimise parallel file systems (Lustre, GPFS, BeeGFS), including metadata tuning, stripe configuration, and I/O performance optimisation.
Develop and maintain automation frameworks (Ansible, Bash, Python) for bare‑metal provisioning, cluster expansion, and repeatable performance configurations.
Perform HPC benchmarking and performance analysis using tools such as HPL, IOzone, IOR, FIO, OSU Micro‑Benchmarks, and application‑level profiling.
Partner with researchers and engineering teams to profile, debug, and tune applications, improving scalability, efficiency, and time‑to‑solution.
Implement system‑level security, reliability, and compliance controls without compromising performance.
Lead capacity planning, scalability assessments, and next‑generation HPC architecture evaluations.

Qualifications

Bachelor’s degree in Computer Science, Engineering, or a related technical field.
7+ years of deep, hands‑on HPC experience in performance‑critical environments (academic labs, national labs, research centres, or enterprise HPC).
Expert‑level knowledge of MPI programming environments and performance tuning for large‑scale parallel jobs.
Strong understanding of NUMA, CPU pinning, memory locality, cache behaviour, and kernel‑level tuning.
Hands‑on experience with RDMA‑capable networks (Infini Band / RoCE), including fabric monitoring and troubleshooting.
Proven experience with GPU‑enabled HPC clusters, CUDA, CUDA‑aware MPI, and GPUDirect technologies.
Advanced experience managing Slurm / PBS / LSF in large, heterogeneous clusters.
Deep expertise in parallel storage performance tuning (Lustre, GPFS, BeeGFS).
Strong Linux internals knowledge of Red Hat Enterprise Linux.
RHCE or equivalent Linux certification preferred.
Experience with benchmark‑driven design decisions and performance regression analysis.
Ability to communicate low‑level performance issues to both technical and non‑technical stakeholders.

Job Function

Analyst and Product Management, Telecommunications

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language