×
Register Here to Apply for Jobs or Post Jobs. X

HPC Engineer

Job in Riyadh, Riyadh Region, Saudi Arabia
Listing for: Penta Consulting
Full Time position
Listed on 2026-06-19
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing: Infrastructure & Operations, IT Infrastructure
Salary/Wage Range or Industry Benchmark: 120000 - 150000 SAR Yearly SAR 120000.00 150000.00 YEAR
Job Description & How to Apply Below

Penta Consulting are a technology service provider and leading outsourced partner helping to deliver professional and managed solutions across EMEA.

We are seeking an experienced Senior Infrastructure HPC Engineer who has personally designed, deployed, configured, and operated every component of a large-scale high-performance computing environment.

Key Responsibilities
  • Design, deploy, and maintain HPC clusters end-to-end: compute nodes, storage tiers, high-speed networking (Infini Band / RoCE), and management fabric.
  • Personally, provision and administer NVIDIA Base Command Manager (BCM) for bare-metal cluster imaging, OS lifecycle, and GPU fleet health monitoring.
  • Deploy and manage the full NVIDIA AI Enterprise Suite: install, license, update, and integrate with MLOps pipelines (NeMo, Triton, RAPIDS).
  • Deploy and operate NVIDIA GPU Operator and Network Operator on Kubernetes to automate driver and CUDA lifecycle, DCGM exporter, and MIG configuration.
  • Configure and serve NVIDIA NIM inference endpoints; implement NVIDIA Blueprint reference architectures for production AI workloads.
  • Install, administer, and tune Slurm: partitions, QOS, fair-share policies, node accounting, MPI integration, and Slurm-on-Kubernetes hybrid scheduling.
  • Bootstrap and operate Kubernetes clusters using kubeadm - including control plane HA, etcd backup, and zero-downtime upgrades.
  • Administer RHEL / Canonical Ubuntu across all cluster nodes.
  • Build and maintain CI/CD pipelines (Git Lab CI / Git Hub Actions) for infrastructure provisioning and HPC software delivery.
  • Profile and tune GPU and CPU workload performance; resolve bottlenecks across hardware, drivers, MPI fabric, and application layers.
  • Implement cluster monitoring with Prometheus, Grafana, and DCGM; define alerting and capacity planning thresholds.
  • Enforce security best practices: node hardening, kernel patching, RBAC, and compliance audits across the HPC environment.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary