Site Reliability Engineer - Observability @ Bank
Listed on 2026-02-16
-
IT/Tech
IT Support, Cybersecurity
Discover ING Bank Romania
ING believes in a world where everyone has the right to grow and progress in their own way. We express this in our global tagline, “do your thing”. Perhaps more than in any other large company, we extend our belief in the power of autonomy to our own people. But there’s a catch. In return for great freedom, we expect people to do great things for our customers, our stakeholders, and ING at large.
To work here is to be surrounded by people who are energetic, ambitious, friendly and respectful: talented specialists who take the responsibility and autonomy to make great things happen. We stay curious, thrive on change, and seek new and better ways to make it happen. Active in Romania for 30 years, ING Bank pioneered and challenged the local banking industry. Technology and innovation are at the core of what we do, making our products relevant for our customers’ lives and businesses.
ING Bank Romania is the only bank with an organic growth within the top 10 local banks by assets, without acquisitions of client portfolios or other banks. ING Bank Romania is an universal bank with more than 1.8 million customers from three business segments: individuals (retail), SME and Mid-Corporate companies and Wholesale Banking.
Join us!
MissionThe SRE team is responsible to roll-out the SRE (Site Reliability Engineering) practices to improve the reliability of Critical Business Services for ING Bank Romania. The SRE team is responsible for defining, introducing, and promoting SRE processes and practices like Observability, Incident & Problem Management, Capacity & Performance Management, IT Service Continuity, Well-Architected Review Framework, Operational Resilience & Reliability Testing, Release Procedures & Change Management, Reliability reporting & error budgeting, etc.
This role is responsible for ‘Observability’ to ensure full visibility into system health, proactive risk identification, and highly efficient incident response.
As part of the SRE team, you will:
Develop, innovate, mature & implement Observability practices and related operational processes in close cooperation with the Global SRE Observability domain.
Adopt global standards for Observability and ensure proper documentation, training material, and knowledge artifacts are available for the engineering community within ING Bank Romania.
Act as the Observability expert for operational activities involving critical applications and infrastructure, including participating in global incident investigations, supporting root‑cause analysis with high‑quality telemetry insights, and advising on weaknesses and improvements across Tech domains.
Contribute to the end‑to‑end reliability strategy through continuous improvement of monitoring coverage, alert efficiency, platform scalability, and telemetry data quality.
The initial focus will be to lead the implementation of Global Observability standards and tooling within ING Bank Romania, ensuring alignment with global platforms and strategies. The rest of the activities include
Perform Observability maturity assessments, identify gaps in logging/metrics/tracing/alerting practices, document findings, and ensure improvement items are prioritized and implemented.
Drive adoption of global SRE Observability best practices by collaborating with Tech teams to standardize dashboards, alerts, instrumentation patterns, and telemetry pipelines; ensure proper documentation is created and maintained.
Improve service resilience by designing and implementing high‑quality observability for Critical Business Services, enabling proactive detection of risks and faster incident response.
Provide technical expertise in solving complex reliability issues where Observability tooling or insights are required.
Guidance for Observability KPIs, SLIs/SLOs and ensure their integration into local and global dashboards.
Ensure accurate reporting and data quality for Observability metrics, contributing feedback to global tooling teams where needed.
Identify tooling or process improvements and drive their adoption (e.g., alert noise reduction programs, dashboard unification, MTTD/MTTR improvements).
Build…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).