×
Register Here to Apply for Jobs or Post Jobs. X

Senior AI Platform Engineer — L2​/L3 Operations; VMs & OpenShift

Job in Riyadh, Riyadh Region, Saudi Arabia
Listing for: Astek Middle East
Full Time position
Listed on 2026-01-01
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 200000 - 300000 SAR Yearly SAR 200000.00 300000.00 YEAR
Job Description & How to Apply Below
Position: Senior AI Platform Engineer — L2/L3 Operations (VMs & OpenShift)

Senior AI Platform Engineer – L2/L3 Operations (VMs & Open Shift)

Direct message the job poster from Astek Middle East.

We are seeking a Senior AI Platform Resident Engineer to lead L2/L3 operations, reliability, and production readiness for enterprise AI platform components deployed across virtual machines and Open Shift environments
.

This role is highly operational and hands‑on, focused on stability, observability, scalability, and security of AI runtime services including model inference, vector databases, messaging, and conversational platforms. You will play a key role in closing operational gaps, defining runbooks, and ensuring reliable service delivery in a restricted, on‑premises environment
.

Key Responsibilities AI Platform & Vector Systems
  • Operate and support LLM inference services (e.g., vLLM) across VMs and Open Shift
  • Support Qdrant (vector search), Kafka, and Rasa in production environments
  • Implement performance tuning, scaling strategies, security hardening, and observability
  • Develop L2 operational runbooks and define clear L3/vendor escalation paths
Messaging & Caching
  • Manage Kafka and Redis clusters with high availability
  • Perform tuning, capacity planning, backup/restore, and failure recovery
  • Monitor throughput, latency, and resource utilization
Platform Operations (VMs & Open Shift)
  • Deploy, manage, and harden services on VM‑based platforms and Open Shift clusters
  • Apply RBAC, TLS, audit logging, resource quotas, autoscaling, and health checks
  • Support CI/CD rollouts and standardize deployment and release processes
Reliability & Observability
  • Build and maintain metrics, logs, alerts, dashboards, and SLO/SLA monitoring
  • Lead incident response,
    root cause analysis (RCA), and post‑incident reviews
  • Execute disaster recovery (DR) testing and resilience validation
Knowledge Transfer & Operational Readiness
  • Identify L2 capability gaps and deliver structured operational training
  • Define SLOs, RPO/RTO
    , escalation workflows, and production readiness checklists
  • Improve documentation and operational maturity across teams
Scope Clarification
  • Postgre

    SQL and Mongo

    DB are out of L2 scope
    and handled by other teams
Qualifications
  • 7+ years operating distributed systems in production environments
  • 3+ years hands‑on experience with Open Shift and/or Kubernetes
  • Strong expertise in Linux, networking, observability, and security hardening
  • Experience supporting Kafka, Redis, Qdrant, Rasa, or LLM inference frameworks
  • Proven experience in L2/L3 support
    , incident management, and escalation handling

Location:

Riyadh, Saudi Arabia. Seniority level:
Mid‑Senior.

Employment type:

Full‑time. Job function:
Information Technology and Engineering.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary