Our client, a globally established and highly regulated financial services group headquartered in Dubai, operates across multiple trading and digital finance verticals including FX, CFDs, and emerging fintech platforms. The organization supports high-volume, latency-sensitive systems serving a global client base and is in an advanced phase of platform modernization, scalability, and operational resilience.
Role OverviewThe Head of SRE is a senior leadership role responsible for owning the reliability, availability, and performance of all production systems. This role bridges engineering and operations, embedding reliability into system design while maintaining strong controls suited to a regulated financial environment. The position oversees incident management, observability, automation, and infrastructure resilience, working closely with Engineering, Infrastructure, Security, and Product leadership to support 24/7 operations.
Key Responsibilities- Own end-to-end platform reliability, availability, and performance across trading, client-facing, and internal systems
- Define and implement SRE strategy, frameworks, and operating models aligned with business scale and regulatory expectations
- Build, lead, and mentor a high-performing SRE team across cloud, infrastructure, and reliability domains
- Establish and monitor SLIs, SLOs, and error budgets to balance system stability with delivery velocity
- Lead incident management, escalation, root cause analysis, and post-incident reviews with clear accountability and learning outcomes
- Design and maintain observability standards covering monitoring, alerting, logging, and tracing
- Drive automation initiatives to reduce manual intervention, improve deployment reliability, and enhance operational efficiency
- Partner with Engineering teams to embed reliability, scalability, and resilience into system architecture and release cycles
- Collaborate with Security, Compliance, and Risk teams to ensure platform reliability aligns with regulatory and audit requirements
- Support capacity planning, disaster recovery, business continuity, and stress testing for high-traffic and volatile market conditions
- Provide executive-level reporting on platform health, risks, and reliability metrics
- Bachelor’s degree in Computer Science, Engineering, or a related technical field (Master’s preferred)
- Proven experience in senior SRE, Dev Ops, or Infrastructure leadership roles within high-availability environments
- Strong background supporting large-scale, distributed, and latency-sensitive systems
- Hands-on expertise with cloud platforms (AWS, GCP, or Azure), containerization, and orchestration technologies
- Deep understanding of reliability engineering principles, incident response, and observability tooling
- Experience designing and operating systems with 24/7 uptime requirements
- Strong automation skills using infrastructure-as-code and CI/CD practices
- Prior exposure to regulated industries such as financial services, fintech, trading, or payments is highly preferred
- Ability to operate at both strategic and hands-on levels in fast-paced environments
- Excellent leadership, stakeholder management, and communication skills
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).