Head of SRE Job Dubai area,Dubai UAE/Dubai,IT/Tech

Our client, a globally established and highly regulated financial services group headquartered in Dubai, operates across multiple trading and digital finance verticals including FX, CFDs, and emerging fintech platforms. The organization supports high-volume, latency-sensitive systems serving a global client base and is in an advanced phase of platform modernization, scalability, and operational resilience.

Role Overview

The Head of SRE is a senior leadership role responsible for owning the reliability, availability, and performance of all production systems. This role bridges engineering and operations, embedding reliability into system design while maintaining strong controls suited to a regulated financial environment. The position oversees incident management, observability, automation, and infrastructure resilience, working closely with Engineering, Infrastructure, Security, and Product leadership to support 24/7 operations.

Key Responsibilities

Own end-to-end platform reliability, availability, and performance across trading, client-facing, and internal systems
Define and implement SRE strategy, frameworks, and operating models aligned with business scale and regulatory expectations
Build, lead, and mentor a high-performing SRE team across cloud, infrastructure, and reliability domains
Establish and monitor SLIs, SLOs, and error budgets to balance system stability with delivery velocity
Lead incident management, escalation, root cause analysis, and post-incident reviews with clear accountability and learning outcomes
Design and maintain observability standards covering monitoring, alerting, logging, and tracing
Drive automation initiatives to reduce manual intervention, improve deployment reliability, and enhance operational efficiency
Partner with Engineering teams to embed reliability, scalability, and resilience into system architecture and release cycles
Collaborate with Security, Compliance, and Risk teams to ensure platform reliability aligns with regulatory and audit requirements
Support capacity planning, disaster recovery, business continuity, and stress testing for high-traffic and volatile market conditions
Provide executive-level reporting on platform health, risks, and reliability metrics

Requirements

Bachelor’s degree in Computer Science, Engineering, or a related technical field (Master’s preferred)
Proven experience in senior SRE, Dev Ops, or Infrastructure leadership roles within high-availability environments
Strong background supporting large-scale, distributed, and latency-sensitive systems
Hands-on expertise with cloud platforms (AWS, GCP, or Azure), containerization, and orchestration technologies
Deep understanding of reliability engineering principles, incident response, and observability tooling
Experience designing and operating systems with 24/7 uptime requirements
Strong automation skills using infrastructure-as-code and CI/CD practices
Prior exposure to regulated industries such as financial services, fintech, trading, or payments is highly preferred
Ability to operate at both strategic and hands-on levels in fast-paced environments
Excellent leadership, stakeholder management, and communication skills

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language