×
Register Here to Apply for Jobs or Post Jobs. X

AI Infrastructure Engineer

Job in Sharjah, UAE/Dubai
Listing for: Dautom
Full Time position
Listed on 2026-05-31
Job specializations:
  • IT/Tech
    AI Engineer, Systems Engineer, Data Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 120000 - 200000 AED Yearly AED 120000.00 200000.00 YEAR
Job Description & How to Apply Below

The AI Infrastructure Engineer is a platform specialist responsible for architecting, building, and operating high-performance AI infrastructure to support advanced AI workloads, including LLMs, GenAI, Computer Vision, and MLOps. This role will focus on managing GPU clusters (NVIDIA A100/H100), deploying and maintaining Red Hat Open Shift AI (RHODS), and ensuring secure, scalable, and cost-efficient AI platforms across SDD’s Sovereign Cloud and hybrid/multi-cloud environments.

The engineer will enable enterprise-grade AI adoption for 200+ government entities.

Key Responsibilities & Deliverables GPU & AI Platform Architecture

Design and implement GPU-based compute clusters. Define reference architectures for LLM hosting, Vector Databases, MLOps, and high-performance storage/networking.

Fully operational GPU-based AI infrastructure. GPU Cluster Uptime and Performance Utilization. Reduction in Cost per Training/Inference Workload.

GPU Cluster Operations

Install, configure, and optimize core components: CUDA, cuDNN, NCCL, NVIDIA Drivers, and GPU Operators. Implement GPU partitioning, scheduling, and performance tuning for high-end GPUs (e.g., A100/H100).

High-availability architecture for all AI workloads. Complete documentation and runbooks.

Open Shift AI (RHODS) Management

Deploy, configure, and maintain the Red Hat Open Shift AI (RHODS) platform for multi-tenant use. Manage the integration of NVIDIA GPU Operator for efficient GPU scheduling and support Data Scientists with Notebooks, Training, and Inference Endpoints.

Production-ready Open Shift AI (RHODS) platform. AI Project Onboarding Speed.

LLM & Model Serving

Build and manage infrastructure for hosting and serving open-source LLM frameworks (Llama, Falcon, Mistral) and supporting RAG pipelines, LoRA adapters, and Vector Databases (Milvus, pgvector).

Multi-model LLM serving environment for entities. MLOps Pipeline Success Rate and Deployment Frequency.

MLOps & Automation

Implement IaC (Terraform, Ansible) and Git Ops for the automated lifecycle management of the AI platform (node onboarding, scaling, model rollout/rollback). Build robust MLOps pipelines for data prep, training, evaluation, and monitoring (using tools like MLflow/Kubeflow).

Infrastructure automation via Terraform & Ansible. Automation Coverage for AI Infrastructure.

Required Qualifications & Experience
  • Experience:

    7–12 years in Cloud Infrastructure, Dev Ops, ML Infrastructure, or Platform Engineering.
  • Deep Hands-On Expertise:
  • GPU Systems (NVIDIA A100/H100), Linux, Containers, and Kubernetes.
  • Open Shift AI (RHODS) or equivalent Kubernetes GPU orchestration.
  • LLM Hosting (Llama, Mistral, Falcon, etc.) and supporting Vector Databases/RAG systems.
  • Strong Experience In:
    Tensor Flow, PyTorch, Hugging Face, Distributed Training (DDP, Deep Speed), and ML Ops Stacks (ML flow, Kubeflow).
Essential Skills & Competencies
  • Technical:
    Deep understanding of GPU compute, HPC architectures, and ML performance profiling. Strong skills in IaC (Terraform/Ansible), CI/CD, and Open Shift/Kubernetes operators.
  • Soft Skills:

    Strong troubleshooting, optimization, and performance engineering mindset. Excellent cross-functional collaboration and documentation skills.
Preferred Certifications
  • NVIDIA Deep Learning / AI Infrastructure Certification
  • Red Hat Open Shift AI specialization
  • Kubernetes CKA/CKAD
  • Azure AI or Oracle Cloud AI certifications
  • Terraform & Ansible certifications
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary