Site Reliability Engineering Lead
Job Description What is the opportunity?
The Lead Site Reliability Engineer will be responsible for spearheading the development, and implementation of Site Reliability Engineering solutions for all applications within City National Bank (CNB), an RBC company. This team will work collaboratively with teams across several lines of business and other Technology and Operations partners as a requirement to succeed in its mandate. This individual will need advanced knowledge and experience working in an application development, support and/or technology operations organization.
They should be able to take on a production support role and part with the SRE team in Commercial Banking.
- Own reliability for critical services, including availability, latency, performance and resilience.
- Design and maintain observability solutions (metrics, logs, traces, dashboards, alerts) with a strong signal-to-noise focus.
- Build and maintain automation for operational tasks (self-healing, runbooks, remediation, deployments, and diagnostics)
- Partner with engineering teams to enhance reliability through design reviews, production readiness reviews, and failure-mode analysis.
- Define, implement and operationalize SLIs, SLOs and error budgets aligned to business expectations.
- Drive blameless postmortems, identify systemic issues, and ensure corrective actions are implemented and tracked.
- Improve change management practices to reduce deployment risk.
- Perform capacity planning, load testing, and performance analysis to prevent incidents before they occur.
- Contribute to DR and resilience strategies.
- Mentor junior engineers and help establish consistent SRE standards and best practices across teams.
- Bachelor’s degree in Computer Engineering, Computer Science or equivalent practical experience.
- 6+ years of related experience in SRE, Dev Ops and/or production engineering roles.
- Advanced knowledge of industry standard SRE best practices.
- Hands-on experience operating highly available systems in production.
- Proficiency in at least one programming or scripting language (Python, Go, Bash, Powershell)
- Advanced experience in a variety of environments (Linux, Windows, Databases, Cloud, and Services/APIs)
- Hands-on experience designing, operating, and troubleshooting message queue-based systems (e.g., Kafka, Rabbit
MQ, Active
MQ, cloud-managed services) - Experience supporting and operating API platforms and gateways (e.g., Apigee, Mulesoft)
- Deep experience with monitoring and alerting systems, and Open Telemetry or unified telemetry pipelines.
- Experience with CI/CD pipelines and deployment automation.
- Solid understanding of networking, load balancing, DNS and TLS fundamentals.
- Excellent communication skills, direct style.
- Experience with cloud platforms (AWS, Azure) and cloud-native architectures.
- Experience with containers and orchestration platforms (e.g., Kubernetes, ECS, AKS)
- Familiarity with infrastructure as code (e.g., Terraform, Cloud Formation)
- Experience integrating reliability tooling with ticketing, paging, and incident management systems.
- Consumer banking experience.
We thrive on the challenge to be our best, progressive thinking to keep growing, and working together to deliver trusted advice to help our clients thrive and communities prosper. We care about each other, reaching our potential, making a difference to our communities, and achieving success that is mutual.
- A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, commissions, and stock where applicable.
- Leaders who support your development through coaching and managing opportunities.
- Ability to make a difference and lasting impact.
- Work in a dynamic, collaborative, progressive, and high-performing team.
- A world-class training program in financial services.
#LI-POST
#TECHPJ
Job SkillsAgile Methodology, Group Problem Solving, IT Systems Integration, Organizational Leadership, Product Services, Software Development Life Cycle (SDLC), System Applications, System Integration Testing (SIT), Systems Software
Additional Job DetailsAddress: RBC CENTRE, 155 WELLINGTON ST W:
T…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: