×
Register Here to Apply for Jobs or Post Jobs. X

Senior AI Infrastructure & Platform Engineer KSA

Job in Riyadh, Saudi Arabia
Listing for: DeepSource Technologies
Full Time position
Listed on 2026-06-14
Job specializations:
  • IT/Tech
    Cloud Computing: Infrastructure & Operations, Systems Engineer, IT Infrastructure, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 200000 - 300000 SAR Yearly SAR 200000.00 300000.00 YEAR
Job Description & How to Apply Below

Role Overview

We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.

Key Responsibilities
  • Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
  • Manage and operate GPU orchestration tools and platforms such as:
    Nvidia Base Command Manager (critical), Nvidia AI Enterprise Suite, Nvidia GPU and Network Operators, Nvidia NIMs and Blueprints.
  • Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
    Slurm (critical), Vanilla Kubernetes.
  • Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
  • Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
  • Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
  • Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
  • Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.
Required Skills & Experience
  • Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
  • Hands‑on experience with:
    Nvidia Base Command Manager, Nvidia AI Enterprise Suite, Nvidia GPU/Network Operators, NIMs, Blueprints.
  • Strong experience with Slurm and/or Kubernetes orchestration.
  • Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
  • Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
  • Excellent troubleshooting and performance-tuning skills.
  • Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
  • Strong understanding of networking, security, resource allocation, and cluster management best practices.
Preferred Qualifications
  • Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
  • Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
  • Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
  • Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary