More jobs:
Senior Site Reliability Engineering Manager
Job in
Greater London, London, Greater London, W1B, England, UK
Listed on 2026-06-27
Listing for:
Cboe
Full Time
position Listed on 2026-06-27
Job specializations:
-
IT/Tech
Systems Engineer, IT Project Manager, IT Support, Cloud Computing: Infrastructure & Operations
Job Description & How to Apply Below
- The Senior Manager, Site Reliability Engineering (London) is an experienced leader responsible for overseeing a globally distributed team of SRE technologists with diverse skills ranging from software development to systems, network, application, and/or database management — with deep subject matter expertise in one or more of these disciplines
- This role sits at the heart of Cboe’s follow-the-sun support model for its US Global Trading Hours (GTH) markets
- Based in London, the Senior SRE Manager provides direct platform support for Cboe’s European operations while also holding oversight responsibility for SRE staff across both the European and Asia-Pacific time zones, ensuring seamless, continuous coverage of Cboe’s real-time low-latency trading platforms around the clock
- The Senior SRE Manager will play a key role supporting and providing guidance throughout the full project lifecycle to deliver operational requirements on schedule, drive strategy across multiple areas of the organization, and tackle complex problems that may lack clear or full strategic definition
- Technical Leadership & System Availability:
Provide technical leadership, support, and operational oversight to sustain resiliency and high availability of critical business operations across European and GTH market sessions - Monitor Cboe production, disaster recovery, and certification systems for issues
- Troubleshoot and drive resolution of issues
- Analyze and optimize performance of real‑time trading platforms
- Oversee daily system checks and ensure Cboe platforms and systems are operating as expected
- Take direct action to resolve known issues as needed
- Assist the build team to resolve build/deployment issues
- People Leadership & Team Development:
Lead, mentor, and provide guidance to direct reports across the European and APAC time zones responsible for platform support - Delegate assignments to direct reports
- Create and execute agile based processes such as Kanban and Scrum to actively manage the workload of the team, ensuring task completion in support of business projects and internal customer timelines
- Actively and intentionally connect direct reports to others within their team, department, and across the organization
- Support training and development needs to create a best‑in‑class SRE team
- Establish operational objectives, policies, and procedures
- Interact regularly with management on matters concerning multiple functional areas, departments, and/or customers
- Liaise with business associates, infrastructure engineers, software engineers, and Cboe management
- Platform Configuration Management & Project Oversight:
Develop and manage operational initiatives to deliver tactical results - Translate functional plans into operational processes and guide execution, providing project management support for all updates applicable to platforms of responsibility
- Provide for configuration management of new and existing trading platforms and support implementation of new features and functionality based on new business requirements
- While the primary focus of this role involves support of bare‑metal on‑premises infrastructure, experience with cloud platforms (e.g., AWS, Azure, GCP) and containerization technologies (e.g., Docker, Kubernetes) is desirable
- Monitor development activities, change management tickets, and evaluate their impact on Cboe Operations
- Approve and execute daily change tickets assigned to Site Reliability Engineering
- Organize testing of changes prior to deployment and work with software engineering to resolve systemic issues
- Demonstrate knowledge of Compliance obligations impacting regulated platforms and work closely with Compliance staff to ensure incident triage, reporting, and remediation obligations are met
- Incident Response & Escalation Management:
Serve as the senior escalation point for production incidents across European and GTH market hours - Coordinate incident triage, root cause analysis, and resolution across globally distributed engineering and operations teams
- Provide timely, precise communication to stakeholders during active incidents and drive post‑incident reviews and remediation tracking to deliver long‑term platform stability
- Subject…
Position Requirements
10+ Years
work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×