IB CTO Team - Site Reliability Engineer; SRE - Assistant Vice President
Cary, Wake County, North Carolina, 27518, USA
Listed on 2026-05-30
-
IT/Tech
Cloud Computing, SRE/Site Reliability, Systems Engineer, IT Support
Job Description:
Job Title IB CTO Team - Site Reliability Engineer (SRE)
Corporate Title Assistant Vice President
Location Cary, NC
Who we are
In short – an essential part of Deutsche Bank’s technology solution, developing applications for key business areas.
Our Technologists drive Cloud, Cyber and business technology strategy while transforming it within a robust, hands‑on engineering culture. Learning is a key element of our people strategy, and we have a variety of options for you to develop professionally. Our approach to the future of work champions flexibility and is rooted in the understanding that there have been dramatic shifts in the ways we work.
Having first established a presence in the Americas in the 19th century, Deutsche Bank opened its US technology center in Cary, North Carolina in 2009. Learn more about us here.
Overview
We are looking for a Site Reliability Engineer (SRE) to join our global team. This role will focus on ensuring the operational health, reliability, performance, and scalability of the CARE platform and multi‑tenant applications, encompassing Global Control Programme(GCP)/on‑prem infrastructure, application deployment, and the underlying CARE services. You will be instrumental in defining and implementing SRE best practices to maintain a highly available and resilient platform.
As a senior IB SRE, you will be crucial in ensuring the continuous operation and improvement of the platform.
What We Offer You
A diverse and inclusive environment that embraces change, innovation, and collaboration
A hybrid working model, allowing for in‑office / work from home flexibility, generous vacation, personal and volunteer days
Employee Resource Groups support an inclusive workplace for everyone and promote community engagement
Competitive compensation packages including health and wellbeing benefits, retirement savings plans, parental leave, and family building benefits
Educational resources, matching gift and volunteer programs
What You’ll Do
Platform Reliability and Performance:
Proactively monitor, troubleshoot, and resolve issues related to platform availability, performance, and capacity on both GCP and on‑prem infrastructureOperational Excellence:
Develop, implement, and maintain SRE best practices, including incident response, post‑mortems, root cause analysis, and proactive problem preventionAutomation and Tooling:
Drive automation efforts to reduce manual toil across operational tasks, deployment, scaling, and recovery. This includes developing and improving monitoring, alerting, and self‑healing systemsSLI/SLO Management:
Define, monitor, and report on Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key platform services, working to continuously improve themCollaboration and Support:
Liaise with application teams (tenants) to understand their operational needs, provide guidance on platform best practices for reliability, capacity planning, and assist with complex troubleshootingSecurity and Compliance:
Collaborate with security teams to ensure the platform adheres to security policies and compliance requirements, focusing on operational security aspects
Skills You’ll Need
Strong understanding of SRE principles and practices, including SLOs/SLIs, incident management, post‑mortems, and toil reduction
Deep understanding of GCP services such as GKE, Identity and Access Management or Illiquid Asset Monitization (IAM), identity services, Cloud
SQL, Cloud Monitoring, Cloud Logging, and related operational aspects. Extensive experience with Kubernetes and container orchestration, including configuration, troubleshooting, and performance tuning. Experience with Service Mesh (e.g., Istio) is highly desirableExperience with monitoring and alerting tools (e.g., Prometheus, Grafana, Splunk, Google Cloud Monitoring) and defining effective alerts and dashboards
Solid experience with Git and Git Hub, including Git workflow for managing code and deployment tooling such as ArgoCD for deployments and managing application life cycles
Programming/scripting (e.g., Python, Go, Java, Bash) and Infrastructure as Code (e.g. Terraform) experience for automation, tooling development,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).