Senior Engineer - Site Reliability Engineering
Listed on 2026-06-18
-
IT/Tech
SRE/Site Reliability, Cloud Computing: Infrastructure & Operations
We are evolving our Site Reliability Engineering capabilities to strengthen reliability, observability, security, and operational excellence across our Markets and Risk Intelligence division.
As a Senior SRE
, you will be a senior hands‑on technical person helping shape the foundations of reliability across both new and existing platforms. You will collaborate with Architecture, Engineering, Security, and Platform teams to ensure reliability is built into systems from day one. While this is not a people‑management or shift‑based role, you will work closely with global teams and may occasionally be called upon for major incidents or critical issues.
This position requires a highly proactive, hard‑working expert with strong leadership presence and ownership of platform reliability outcomes.
- Lead the establishment of SRE foundations for new projects building environments, monitoring, alerting, and ensuring operational readiness from day one.
- Define, implement, and champion observability standards, tooling, and guidelines across metrics, logs, traces, and SLIs/SLOs.
- Design and evolve monitoring and alerting solutions that improve visibility, reduce toil, and strengthen system health.
- Continuously drive reliability improvements across our environments through incident reduction, performance tuning, and building resilient patterns.
- Partner with Security teams to ensure our platforms meet compliance, security, and risk‑management expectations.
- Lead seamless handovers from project delivery into BAU SRE operations by ensuring documentation, readiness, and strong operational practices.
- Be a technical leader and mentor supporting engineers, shaping engineering standards, and fostering a culture of learning and development.
- 5+ years of hands‑on technical experience in SRE, Platform Engineering, Infrastructure, or related roles.
- Strong experience with Azure, including services such as AKS, Azure Container Apps, Virtual Machines, virtual networking (VNet), Azure Active Directory (Entra ), and Azure managed services.
- Hands‑on experience with Kubernetes and containerized platforms.
- Proven experience designing and operating observability platforms, including monitoring, logging, and alerting.
- Hands‑on experience with Datadog for metrics, logs, APM, and alerting.
- Strong understanding of SRE principles, including SLOs, error budgets, incident management, and reliability engineering.
- Understanding of cloud security principles and experience collaborating with security teams.
- Experience with cloud cost optimization strategies and tooling.
- Experience or working knowledge of AWS.
- Experience supporting multi‑cloud or hybrid environments.
- Exposure to Infrastructure as Code (e.g., Terraform, Cloud Formation).
- Experience in large‑scale, complex, or regulated environments.
- Knowledge of vector databases and RAG architectures for building internal SRE knowledge assistants.
- Knowledge of Generative AI and LLM platforms (e.g., Claude, Amazon Bedrock).
- Experience integrating AI with observability stacks (Prometheus, Grafana, ELK, Open Telemetry) for proactive issue detection.
- Strong technical authority with the ability to influence design and operational decisions.
- Highly collaborative, comfortable working across architecture, engineering, security, and operations teams.
- Calm and methodical under pressure, especially during incidents and critical issues.
- Pragmatic problem‑solver who balances reliability, security, cost, and delivery speed.
- Clear communicator, able to explain complex technical concepts to diverse audiences.
Senior Associate
Equal Opportunity StatementWe are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone’s race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants’ and employees’ religious practices and beliefs, as well as mental health or physical disability needs.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).