Senior Site Reliability Engineering Manager Job London area,Greater London England UK,IT/Tech

Location: Greater London

The Senior Manager, Site Reliability Engineering (London) is an experienced leader responsible for overseeing a globally distributed team of SRE technologists with diverse skills ranging from software development to systems, network, application, and/or database management — with deep subject matter expertise in one or more of these disciplines
This role sits at the heart of Cboe’s follow-the-sun support model for its US Global Trading Hours (GTH) markets
Based in London, the Senior SRE Manager provides direct platform support for Cboe’s European operations while also holding oversight responsibility for SRE staff across both the European and Asia-Pacific time zones, ensuring seamless, continuous coverage of Cboe’s real-time low-latency trading platforms around the clock
The Senior SRE Manager will play a key role supporting and providing guidance throughout the full project lifecycle to deliver operational requirements on schedule, drive strategy across multiple areas of the organization, and tackle complex problems that may lack clear or full strategic definition
Technical Leadership & System Availability:
Provide technical leadership, support, and operational oversight to sustain resiliency and high availability of critical business operations across European and GTH market sessions
Monitor Cboe production, disaster recovery, and certification systems for issues
Troubleshoot and drive resolution of issues
Analyze and optimize performance of real‑time trading platforms
Oversee daily system checks and ensure Cboe platforms and systems are operating as expected
Take direct action to resolve known issues as needed
Assist the build team to resolve build/deployment issues
People Leadership & Team Development:
Lead, mentor, and provide guidance to direct reports across the European and APAC time zones responsible for platform support
Delegate assignments to direct reports
Create and execute agile based processes such as Kanban and Scrum to actively manage the workload of the team, ensuring task completion in support of business projects and internal customer timelines
Actively and intentionally connect direct reports to others within their team, department, and across the organization
Support training and development needs to create a best‑in‑class SRE team
Establish operational objectives, policies, and procedures
Interact regularly with management on matters concerning multiple functional areas, departments, and/or customers
Liaise with business associates, infrastructure engineers, software engineers, and Cboe management
Platform Configuration Management & Project Oversight:
Develop and manage operational initiatives to deliver tactical results
Translate functional plans into operational processes and guide execution, providing project management support for all updates applicable to platforms of responsibility
Provide for configuration management of new and existing trading platforms and support implementation of new features and functionality based on new business requirements
While the primary focus of this role involves support of bare‑metal on‑premises infrastructure, experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes) is desirable
Monitor development activities, change management tickets, and evaluate their impact on Cboe Operations
Approve and execute daily change tickets assigned to Site Reliability Engineering
Organize testing of changes prior to deployment and work with software engineering to resolve systemic issues
Demonstrate knowledge of Compliance obligations impacting regulated platforms and work closely with Compliance staff to ensure incident triage, reporting, and remediation obligations are met
Incident Response & Escalation Management:
Serve as the senior escalation point for production incidents across European and GTH market hours
Coordinate incident triage, root cause analysis, and resolution across globally distributed engineering and operations teams
Provide timely, precise communication to stakeholders during active incidents and drive post‑incident reviews and remediation tracking to deliver long‑term platform stability
Subject…