Site Reliability Engineer
Listed on 2026-06-02
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Project Manager
The Site Reliability Engineer builds out solutions to support Platform disaster response/crisis management activities in compliance with the Engineering and Customer requirements and helps provide and coordinate disaster preparedness with respect to the organization’s Platform, helping ensure business continuity.
They also ensure we have enough resources to meet current and future Platform demand efficiently, involving forecasting needs, capacity planning, monitoring performance (KPIs), managing risks (shortages/overloads), and developing strategies for optimisation.
Your impact:- Work with Engineering & Service Management to ensure that the disaster recovery and Capacity plans drive disaster recovery (DR) strategy and procedures both in Cloud and DC venues.
- Build out tooling that supports the DR plans and tracks progress and maturity against set KPI’s and Metrics.
- Work with Engineering & Service Management to ensure that disaster recovery solutions are adequate, in place, maintained, and tested as part of the regular operational life cycle.
- Provide ongoing feedback for risk management, mitigation, and prevention.
- Develop and implement capacity planning tooling, frameworks, policies, and strategies.
- Provide capacity requirements and impact assessments for new services or changes.
- Collaborate with other Platform managers to deliver objectives on our platform evolution roadmap.
- Experience of Linux administration
- Experience or strong understanding of Kubernetes
- Being comfortable in a scripting language suitable for automation tasks
- Understanding of current recovery solutions and high availability architectures for cloud and on prem
- Understanding of Capacity Management & Planning scenarios and tooling
- Experience with Agile principles and practices
- Expertise in problem diagnosis across complex, distributed systems
- Experience supporting SaaS products
- Experience using AI for Automation
- Experience with Incident Management, Post Mortems and related practices
- Knowledge of observability and monitoring best practices
- Experience operating within one or more public clouds (AWS, GCP, Azure)
- Experience with configuration management, and infrastructure as code
As set forth in Anaplan’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.
We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, perform essential job functions, and receive equitable benefits and all privileges of employment. Please contact us to request accommodation.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: