Observability Engineering Lead
Listed on 2026-06-13
-
Software Development
DevOps, Software Engineer, Cloud Engineer - Software, Backend Developer
Observability Engineering Lead
Location:
London
Corporate
Title:
Vice President
The Enterprise Observability Engineering team is at the heart of Deutsche Bank’s technology transformation. Our mission is to provide a unified, scalable, and resilient observability platform that delivers deep insights into the health and performance of our critical applications and infrastructure. By leveraging open standards and best‑in‑class tooling, we empower our engineering community to build, run, and innovate with confidence. This role is pivotal in our strategy to adopt Open Telemetry (OTEL) as the standard for telemetry data collection across the enterprise.
You will work closely with the lead Observability Architect to translate high‑level architectural vision into actionable, robust, and compliant engineering designs. You will be instrumental in delivering the Open Agent Management Protocol (OpAMP) based supervisory plane and the fleet of OTEL data collection agents, ensuring they are performant, scalable, and adhere to the bank's standards.
What We’ll Offer You- Hybrid Working – a model that enables eligible employees to work remotely part of the working time and reach a working pattern that works for them
- Competitive salary and non‑contributory pension
- 30 days’ holiday plus bank holidays, with the option to purchase additional days
- Life Assurance and Private Healthcare for you and your family
- A range of flexible benefits including Retail Discounts, a Bike4
Work scheme and Gym benefits - The opportunity to support a wide ranging CSR programme + 2 days’ volunteering leave per year
Key Responsibilities
- Collaborate with the Observability Architect to translate strategic architecture into detailed, practical designs for the OTEL collection tier and lead the engineering efforts to build, test, and deploy these solutions. Architect and build the collection pipeline to handle massive data volumes from thousands of applications and hosts across the globe.
- Guide the global engineering team in the development and lifecycle management of the Open Telemetry data collection agents and the OpAMP-based supervisory control plane for large‑scale agent management.
- Serve as the subject matter expert for OTEL implementation, ensuring all solutions strictly adhere to Deutsche Bank’s internal technology standards, security policies, and regulatory guidelines (e.g., BaFin, MAS, GDPR).
- Design and implement mechanisms within the observability pipeline to manage and respect regional data protection and data sovereignty requirements, ensuring data is processed in the correct geographical jurisdictions.
- Provide technical guidance and mentorship to a globally distributed team of engineers. Foster a culture of engineering excellence, collaboration, and innovation. Develop and maintain dashboards and reports on the health, adoption, and performance of the OTEL ecosystem.
- Champion best practices for CI/CD, automated testing, and release management. Own the debugging and resolution of complex technical issues across the observability stack.
- Proven, hands‑on experience designing and implementing solutions using the Open Telemetry framework (OTEL), including collectors, exporters, and instrumentation libraries.
- Extensive hands‑on experience with at least one enterprise‑grade observability tool (e.g., New Relic, App Dynamics, Dynatrace, ITRS Geneos) covering both APM and Infrastructure Monitoring.
- Demonstrable experience successfully engineering, deploying, and managing open‑source based solutions within a large, complex enterprise environment.
- Familiarity with agent control planes and management protocols; direct experience with OpAMP is a significant advantage.
- Strong software engineering background with proficiency in multiple languages such as GoLang and Python. Experience with Java and/or NodeJS for instrumentation is highly valuable. Practical experience with key components of the modern observability ecosystem, including Prometheus and Grafana. Solid understanding of high‑throughput messaging systems like Apache Kafka.
- Debugging and analytical skills, with the ability to troubleshoot complex performance…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: