×
Register Here to Apply for Jobs or Post Jobs. X

Senior MLOps Engineer

Job in Abu Dhabi, UAE/Dubai
Listing for: Institute of Foundation Models
Full Time position
Listed on 2025-12-10
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer, Data Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 200000 - 300000 AED Yearly AED 200000.00 300000.00 YEAR
Job Description & How to Apply Below

About the Institute of Foundation Models (IFM)

The Institute of Foundation Models is a dedicated research lab for building, understanding, deploying, and risk‑managing large-scale AI systems. We drive innovation in foundation models and their operationalization, empowering research, education, and adoption through scalable infrastructure and real‑world applications.

As part of our engineering team, you will operate at the intersection of machine learning and systems design — building the cloud, orchestration, and deployment layers that power the next generation of intelligent applications ’ll work alongside world‑class AI researchers and engineers to product ionize LLMs, voice models, and multimodal systems at scale.

The Role

As a Senior MLOps Engineer
, you will design, build, and maintain robust ML (Machine Learning) infrastructure across training, inference, and deployment pipelines. You will take ownership of the model lifecycle — from data ingestion to real‑time serving — and ensure our LLM and speech models are deployed efficiently, securely, and reproducibly in Kubernetes‑based environments.

This position requires deep hands‑on experience with Kubernetes (EKS),
Helm
, AWS cloud infrastructure
, and modern MLOps tool chains (e.g.,
vLLM
, SGLang
, OpenWebUI
, Weights & Biases
, MLflow
). Familiarity with speech/voice AI frameworks like Eleven Labs
, Whisper
, and RVC is also valuable.

Key Responsibilities
  • Design and manage scalable ML infrastructure on AWS using EKS
    , EC2
    , RDS
    , S3
    , and IAM
    -based access control.
  • Build and maintain Kubernetes deployments for LLM and TTS inference using Helm
    , ArgoCD
    , and Prometheus/Grafana monitoring.
  • Implement and optimize model serving pipelines using vLLM
    , SGLang
    , TensorRT
    , or similar frameworks for high‑throughput inference.
  • Develop CI/CD and MLOps automation for data versioning, model validation, and deployment (Git Hub Actions, Jenkins, or AWS Code Pipeline).
  • Integrate OpenWebUI
    , Gradio
    , or similar UIs for user‑facing model demos and internal evaluation tools.
  • Collaborate with ML researchers to productize models — including TTS (e.g., Eleven Labs API), ASR (Whisper), and LLM‑based chat systems.
  • Ensure observability, cost optimization, and reliability of cloud resources across multiple environments.
  • Contribute to internal tools for dataset curation, model monitoring, and retraining pipelines
    .
  • Maintain infrastructure‑as‑code using Terraform and Helm charts for reproducibility and governance.
  • Support real‑time multimodal workloads (voice, text, vision) across inference clusters.
Academic Qualifications
  • 4+ years of experience in MLOps
    , Dev Ops
    , or Cloud Infrastructure Engineering for ML systems.
  • Strong proficiency in Kubernetes
    , Helm
    , and container orchestration
    .
  • Experience deploying ML models via vLLM
    , SGLang
    , TensorRT
    , or Ray Serve
    .
  • Proficiency with AWS services (EKS, EC2, S3, RDS, Cloud Watch, IAM).
  • Solid experience with Python
    , Docker
    , Git
    , and CI/CD pipelines
    .
  • Strong understanding of model lifecycle management
    , data pipelines
    , and observability tools (Grafana, Prometheus, Loki).
  • Excellent collaboration skills with ML researchers and software engineers.
Professional Experience – Preferred
  • Extensive

    Experience with
    vLLM
    , K8s
    , Elevenlabs
    , Whisper
    , Gradio/OpenWebUI
    , or custom TTS/ASR model hosting.
  • Familiarity with multi‑GPU scheduling
    , NCCL optimization
    , and HPC cluster integration
    .
  • Knowledge of security
    , cost management
    , and network policy in multi‑tenant Kubernetes clusters and cloudflare systems.
  • Prior work in LLM deployment
    , fine‑tuning pipelines
    , or foundation model research
    .
  • Exposure to data governance and responsible AI operations in research or enterprise settings.
#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary