More jobs:
Job Description & How to Apply Below
HPC Engineer
Location: Riyadh, Saudi Arabia
Seniority Level: Mid-Senior level
Employment Type: Contract
Job DescriptionArchitect, deploy, and optimise large‑scale, performance‑critical HPC environments that support tightly coupled MPI workloads, GPU‑accelerated applications, and data‑intensive simulations. Focus on extracting maximum performance from compute, network, and storage layers through low‑level tuning, benchmarking, and continuous optimisation.
Responsibilities- Design, architect, and operate HPC clusters of 100–500+ nodes, including CPU‑only and GPU‑accelerated systems optimized for tightly coupled parallel workloads.
- Design and tune MPI‑based environments (OpenMPI, Intel MPI, MPICH), including process pinning, CPU affinity, memory binding, and topology‑aware scheduling.
- Optimize NUMA architectures, huge pages, CPU isolation, BIOS/firmware settings, and kernel parameters for latency‑ and throughput‑sensitive workloads.
- Deploy and tune high‑speed interconnects (Infini Band / RDMA / RoCE), including fabric configuration, QoS, congestion control, and performance validation.
- Configure and operate GPU‑accelerated HPC systems, including CUDA‑aware MPI, NCCL, GPUDirect RDMA, and multi‑GPU/NVLink topologies.
- Manage and tune job schedulers (Slurm, PBS, LSF) with advanced configurations such as topology‑aware scheduling, GPU binding, fair‑share policies, and preemption.
- Design and optimise parallel file systems (Lustre, GPFS, BeeGFS), including metadata tuning, stripe configuration, and I/O performance optimisation.
- Develop and maintain automation frameworks (Ansible, Bash, Python) for bare‑metal provisioning, cluster expansion, and repeatable performance configurations.
- Perform HPC benchmarking and performance analysis using tools such as HPL, IOzone, IOR, FIO, OSU Micro‑Benchmarks, and application‑level profiling.
- Partner with researchers and engineering teams to profile, debug, and tune applications, improving scalability, efficiency, and time‑to‑solution.
- Implement system‑level security, reliability, and compliance controls without compromising performance.
- Lead capacity planning, scalability assessments, and next‑generation HPC architecture evaluations.
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- 7+ years of deep, hands‑on HPC experience in performance‑critical environments (academic labs, national labs, research centres, or enterprise HPC).
- Expert‑level knowledge of MPI programming environments and performance tuning for large‑scale parallel jobs.
- Strong understanding of NUMA, CPU pinning, memory locality, cache behaviour, and kernel‑level tuning.
- Hands‑on experience with RDMA‑capable networks (Infini Band / RoCE), including fabric monitoring and troubleshooting.
- Proven experience with GPU‑enabled HPC clusters, CUDA, CUDA‑aware MPI, and GPUDirect technologies.
- Advanced experience managing Slurm / PBS / LSF in large, heterogeneous clusters.
- Deep expertise in parallel storage performance tuning (Lustre, GPFS, BeeGFS).
- Strong Linux internals knowledge of Red Hat Enterprise Linux.
- RHCE or equivalent Linux certification preferred.
- Experience with benchmark‑driven design decisions and performance regression analysis.
- Ability to communicate low‑level performance issues to both technical and non‑technical stakeholders.
Analyst and Product Management, Telecommunications
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×