Secure Reliability Engineering Manager
Listed on 2026-02-16
-
IT/Tech
Cybersecurity, Cloud Computing
We help the world run better
At SAP, we keep it simple: you bring your best to us, and we'll bring out the best in you. We're builders touching over 20 industries and 80% of global commerce, and we need your unique talents to help shape what's next. The work is challenging – but it matters. You'll find a place where you can be yourself, prioritize your wellbeing, and truly belong.
What's in it for you? Constant learning, skill growth, great benefits, and a team that wants you to grow and succeed.
“Due to the potentially classified nature of our work, your willingness is required to subject yourself to a governmental security clearance process.”
OverviewWe are seeking an experienced Secure Reliability Engineering (SRE) Manager to lead the reliability, resilience, and secure operation of a sovereign cloud platform supporting regulated and high-trust workloads. This role is responsible for ensuring that availability, performance, and security are engineered into the platform by design, using Terraform-driven Infrastructure as Code (IaC), cloud-native services, and open-source technologies.
The ideal candidate brings deep technical credibility in cloud reliability engineering, strong people leadership, and a security-first mindset—treating security, compliance, and sovereignty as core reliability requirements, not afterthoughts.
Key Responsibilities Platform Reliability & ArchitectureOwn the reliability, availability, and resilience of sovereign cloud platforms supporting regulated workloads across hyperscalers (AWS, Azure, GCP, and sovereign variants)
Design and enforce secure information and failure boundaries, including:
Network segmentation and fault isolation
Identity, access, and privilege separation
Data residency, encryption, and key management controls
Define and manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and error budgets aligned with sovereign and regulatory requirements
Partner with Security, Architecture, and Compliance teams to ensure reliability designs meet sovereignty, regulatory, and contractual obligations
Lead development and governance of Terraform-based IaC frameworks with reliability and security baked in
Establish reusable modules, standards, and pipelines for:
Cloud-native services (compute, storage, networking, identity)
Built-in resilience patterns (multi-zone, multi-region, failover)
Embedded security and compliance controls
Drive automation for:
Provisioning and configuration
Drift detection and remediation
Capacity management and lifecycle operations
Build and operate reliability-focused CI/CD pipelines for infrastructure and platform services
Lead operational practices including:
Monitoring, logging, tracing, and alerting
Incident response, root cause analysis, and post-incident reviews
Change, release, and reliability risk management
Reduce toil through automation while maintaining strict security and change controls
Implement security-by-default and resilience-by-design practices across all environments
Ensure operational alignment with frameworks such as:
Zero Trust architecture
NIST, ISO, SOC, or equivalent regulatory standards
Support audits and assessments by delivering traceable, code-driven controls, operational evidence, and reliability metrics
Treat compliance gaps, security weaknesses, and reliability risks as production-impacting issues
Govern and operate cloud-native and open-source platforms such as:
Kubernetes, Helm, Argo, Vault, Open Policy Agent
Ensure platforms are secure, observable, resilient, and supportable
Evaluate emerging technologies that improve reliability, security posture, and operational efficiency
Lead, mentor, and grow a team of Secure Reliability Engineers
Establish an SRE culture focused on:
Blameless incident response
Continuous improvement
Strong operational ownership
Define clear roadmaps, reliability goals, and success metrics aligned with business and sovereign requirements
10 years of experience in SRE, Dev Ops, Cloud…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).