×
Register Here to Apply for Jobs or Post Jobs. X

DevOps Engineer

Job in Dubai, Dubai, UAE/Dubai
Listing for: MBR Partners
Full Time position
Listed on 2026-01-03
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 120000 - 200000 AED Yearly AED 120000.00 200000.00 YEAR
Job Description & How to Apply Below

Dev Ops Engineer

Join M  Partners as a Dev Ops Engineer in the dynamic high‑tech landscape of Dubai, UAE. M  Partners is the exclusive software partner to one of the world’s largest ODMs in the networking equipment space, developing network operating systems that power critical data centre and telecom routing and switching infrastructure. They have recently launched an AI division focused on designing custom chips to accelerate inference and training workloads.

The company is building a true networking vendor and a thriving ecosystem for embedded systems and ASIC design talent across the MENA region.

Mission

Own the end‑to‑end design and operation of on‑premise infrastructure for AI and enterprise workloads—built as code, automated, observable, and secure. Architect and run Kubernetes clusters for training and inference, manage servers, networks, and core services, and enable developers with reliable CI/CD and platform tooling. Your work directly impacts AI velocity at scale.

Responsibilities
  • Design and operate on‑prem infrastructure as code: author reusable Terraform/Ansible/Helm modules and build Git Ops workflows (e.g., Argo CD) for repeatable, audited changes across environments.
  • Build and run Kubernetes for AI: configure multi‑tenant GPU clusters (MIG/GPUDirect RDMA, NVIDIA device plugins/DCGM), scheduling/quotas, HPA/Cluster Autoscaler where applicable, and workload isolation.
  • Administer servers, networks, and core services: OS lifecycle (Linux), identity/SSO (Keycloak/LDAP), secrets (Vault), DNS/DHCP/NTP, artifact registries, and internal package mirrors.
  • Provide storage for AI pipelines: integrate and operate high‑bandwidth/low‑latency storage, tune for dataset staging and checkpointing patterns.
  • Enable CI/CD: partner with developers to design fast, reproducible pipelines (Git Lab CI/Git Hub Actions), caching and runners on GPU/CPU nodes, artifact provenance (SBOM, SLSA).
  • Collaborate with platform, ML, silicon, systems, security, application developers, and site ops to turn infrastructure into a product that accelerates the business.
Minimum Qualifications
  • 5+ years in Dev Ops/SRE/Platform Engineering with hands‑on ownership of on‑prem hardware.
  • Proven experience operating Kubernetes in production (multi‑tenant RBAC, networking/CNI).
  • Proficiency with IaC and automation (Terraform, Ansible, Helm; Git Ops with Argo CD/Flux).
  • Strong Linux administration, scripting (Bash/Python), and troubleshooting across compute, network, and storage stacks.
  • CI/CD expertise (Git Lab CI/Git Hub Actions), container build security (SBOM, image signing).
  • Solid networking fundamentals (L2/L3, routing, BGP, VLANs, EVPN/VXLAN, load balancing, TLS/mTLS).
  • Experience implementing observability (Prometheus/Grafana, logs, tracing) and running incident response.
Preferred (Nice‑to‑Haves)
  • GPU cluster operations for AI (NVIDIA drivers/operator, DCGM, MIG, GPUDirect RDMA, Slurm integration).
  • Storage for data‑intensive workloads (Ceph, parallel file systems, NVMe‑oF) and performance tuning.
  • Secrets/identity platforms (Vault, Keycloak/LDAP/SSO), policy‑as‑code (OPA/Gatekeeper, Kyverno).
  • Security/compliance practices (CIS benchmarks, SLSA, supply‑chain scanning) and zero‑trust networking.
  • Data centre experience (rack/stack, power/cooling basics) and remote site rollout automation.
  • Familiarity with configuration management for network devices and API‑driven switches/routers.
  • Reproducible environments: spin up identical dev/test stacks from Git in ≤30 minutes with audit trails for every change.
  • Solid CI/CD for AI workflows: deterministic pipelines, cache‑efficient, median pipeline time down 30–50% with artifact provenance.
  • Predictable GPU orchestration: fair‑share scheduling, quotas, isolation (MIG/namespace policies) keep queues short; cluster utilisation increases >20%.
  • Lab‑to‑cluster continuity: versioned hardware bring‑up images, drivers, firmware promoted through pipelines; new boards/nodes join clusters with push‑button automation.
  • Actionable observability: dashboards/alerts reflecting SLOs meaningful to researchers; MTTR 80% routine requests resolved via self‑service workflows.
Other Information

Recruitment:
Referral program increases interview chances by 2×.

Travel & Visa:
The client can obtain work visas for Dubai, provides flights and visa support; accommodation not provided. Salary flexible per profile.

Location:

Global Village, Dubai, United Arab Emirates.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary