Principal Site Reliability Engineer
Listed on 2026-02-09
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Overview
Orgvue is a leading organizational design and planning software platform that captures the power of data visualization and modelling to build more adaptable, and better performing organizations. HR, finance and business leaders use Orgvue for actionable insight and analysis that helps them make faster workforce decisions in a constantly changing world.
Orgvue is used by the world’s largest and best-known enterprises and management consulting firms to visualize and confidently build the businesses they want tomorrow, today. The company is headquartered in London, with offices in Philadelphia, The Hague, Toronto, and Sydney.
RoleWe are seeking a Principal Site Reliability Engineer who will be a senior technical leader focused on scaling and hardening our AWS- and Kubernetes-based infrastructure.
Responsibilities- Define and enforce SLOs, SLIs, and error budgets across critical services
- Crafting and implementing a cloud infrastructure and tooling strategy
- Work across our Org to level up SRE practices
- Help implement robust observability metrics, logs & traces using our observability tool
- Guide the team in building automated, self-healing systems
- Own and evolve our incident response processes, including on-call practices and post-mortem culture
- Mentor engineers across the org on best practices in reliability, operational readiness, and scalable infrastructure
- Drive Infrastructure as Code (IaC) using Terraform, Kubernetes, Cloud Formation and Git Ops practices
- Collaborate closely with security, Dev Ops, and software teams to ensure compliance, scalability, and operational excellence
- Evaluate and introduce tools, patterns, and practices that improve the performance and reliability of our SaaS platform
- Demonstrable experience leading SRE transformations
- Deep hands-on expertise with Kubernetes (EKS preferred) in production environments
- Strong experience with
AWS core services (EC2, EKS, RDS, S3, ALB/NLB, IAM, Cloud Watch, etc.) - Expert in Infrastructure as Code using tools such as Terraform
, with knowledge of Git Ops workflows - Strong background in observability: metrics, visualization, logging, and tracing
- Understanding of automation, SDLC, CI/CD pipelines, deployment automation, and blue/green or canary releases
- Proven experience with incident management, disaster recovery planning, root cause analysis, and post-incident reviews
- Hybrid working - 1+ days a week in the London office
- Wellbeing:
Sanctus Coaching, Virtual fitness sessions, Wellbeing webinars, Annual Wellbeing day - Subsidised Gym Membership
- Private Medical Insurance (including Dental and Vision) and Life Assurance
- 25 days holiday (increasing to 30 days at a rate of 1 extra day per year)
- Summer Fridays (half-day Fridays for the months of July and August)
- Employer pension contribution of 5% of your gross salary, if you contribute a minimum of 3%
- Season ticket Loan
- Cycle to Work Scheme
- Annual Discretionary Bonus
Here at Orgvue we promote individualism and a diverse workforce to build on our future success
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: