Senior Site Reliability Engineer
Greater London, London, Greater London, W1B, England, UK
Listed on 2026-06-07
-
IT/Tech
SRE/Site Reliability, Systems Engineer, IT Support, Cloud Computing
The Role
This role sits in the core Platform/SRE team that owns production. You’ll work directly on incident response, on-call, system reliability, and day-to-day operations for Heidi’s platform.
What you’ll doParticipate in on-call and incident response: Respond to production incidents, contribute to service restoration, and support clear communication during incidents. Over time, take increasing responsibility for leading incidents end-to-end.
Improve operational reliability: Identify recurring issues and reliability risks, and drive fixes through better alerting, automation, system changes, or process improvements.
Own parts of the production environment: Operate and improve Kubernetes clusters, cloud infrastructure, and core platform services, with growing ownership as familiarity increases.
Strengthen observability: Improve dashboards, alerts, logs, and traces so issues are detected earlier and diagnosed faster, with a strong focus on actionable signals.
Reduce operational toil: Automate repetitive tasks, simplify runbooks, and improve tooling to make on-call and day-to-day operations easier and safer.
Support safe change: Improve deployments, rollback mechanisms, and operational readiness to reduce the risk of incidents caused by change.
Contribute to operational practices: Write and maintain runbooks, participate in blameless post-mortems, and help improve incident response processes over time.
Collaborate closely with engineers: Work with product and feature teams to improve production readiness, service ownership, and reliability expectations.
3–6+ years in SRE, Dev Ops, Platform, or operations-heavy engineering roles.
Experience supporting production systems and participating in on-call rotations.
Comfortable debugging live systems under pressure.
Experience operating cloud infrastructure (AWS preferred).
Working knowledge of Kubernetes and containerised workloads.
Infrastructure as Code experience (Terraform or similar).
Familiarity with monitoring and alerting tools (Datadog, Prometheus, etc).
Scripting or automation experience (Python, Bash, or similar).
Experience leading incidents or mentoring others during on-call.
Experience in regulated or security-sensitive environments.
Familiarity with databases, queues, and caches in production.
Interest in reliability practices such as SLOs, error budgets, and capacity planning.
We own production: The Platform/SRE team is responsible for reliability and incident response.
Incidents are blameless: We focus on learning and improving systems, not assigning fault.
Practical over perfect: We prioritise improvements that reduce real operational pain.
Calm under pressure: Clear thinking and communication matter during incidents.
Real product momentum. We're not trying to generate interest, we're channeling it.
Equity from day one. When Heidi wins, you win. You'll share directly in the success you help create.
Unmatched impact. Play a pivotal role at a critical growth moment - working on a product that delivers tangible, real-world value to clinicians and patients every day.
Work alongside world-class talent. Join a team of operators and builders who've scaled unicorns.
Your health, covered. Comprehensive private medical and dental cover through Bupa, plus 24/7 mental health, coaching and wellbeing support through Sonder and a £100/month Healthy Heidi’s stipend.
Global parental leave. 26 weeks paid for primary carers and 18 weeks for secondary carers, subject to eligibility.
Fertility support. £7,000 one-off payment, eligibility applies.
Learning & development. £700 per year for courses, books, memberships, conferences and more.
Home office budget. £500 one-off to set up a workspace you actually want to work in.
Recharge days. after major milestones and busy periods so you can reset and come back strong.
Work from anywhere. for up to 4 weeks per year, wherever the world takes you.
Clinical leave. 10 days per year for eligible clinical roles to maintain accreditation and requirements.
Flexibility that works. A hybrid environment, with 3 days in the office.
Heidi’s commitment to Diversity, Equity and Inclusion
Heidi is dedicated to creating an equitable, inclusive, and supportive work environment that brings people together from diverse backgrounds, experiences, and perspectives. Our strength is in our differences. We're proud to be an equal opportunity employer and are proud to welcome all applicants as we're committed to promoting a culture of opportunity for all.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: