Principal Platform Site Reliability Engineer SASE Cloud Platforms
Listed on 2025-12-22
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Company Description
Our Mission
At Palo Alto Networks® everything starts and ends with our mission:
Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we’re looking for innovators who are as committed to shaping the future of cybersecurity as we are.
Who We Are
We believe collaboration thrives in person. That’s why most of our teams work from the office full time, with flexibility when it’s needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes.
Job DescriptionYour Career
Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability.
Central Infrastructure & Platform Engineering Team | Santa Clara, CA (Hybrid/Onsite as applicable)
We’re hiring a Sr Staff Platform SRE for our SASE central cloud platform team. We’re looking for a well‑rounded platform SRE who can architect, build, and operate cloud‑native infrastructure at very large scale across GCP, AWS, and OCI.
This is a unique opportunity to operate at a humongous scale—the platforms you’ll influence are tied to hundreds of millions of dollars of annual cloud spend, and the work you do will directly impact reliability, efficiency, developer velocity, and operational excellence across the organization.
Your Impact- Act as an architect for infrastructure owned by the team—plan ahead and design in line with scale requirements.
- Design, develop, and execute infrastructure components for the platforms owned by the team.
- Own Infrastructure as Code (IaC), Monitoring as Code (MaC), Policy as Code (PaC) components and build the golden path for future platforms with best practices.
- Strive for autonomy with an automation‑first mindset, including modern AI‑driven approaches.
- Redefine and continuously update modern CI/CD practices for cloud‑native workloads.
- Perform on‑call duties and reduce on‑call toil through automation, AI agents, analyzers, and self‑healing patterns.
- Support internal platform users as a forward‑deployed engineer, close the feedback loop, and modernize the platform based on user needs.
- Maintain a security‑first mindset without compromising reliability and operate‑ability.
- Design cost‑effective infrastructure solutions across AWS, GCP, and OCI, including cost governance, capacity planning, and efficiency improvements.
Your Experience
- BS or MS in Computer Science, a related field, or equivalent professional experience.
- Expert knowledge of Kubernetes and CNCF ecosystem tools such as Helm, Prometheus, Backstage, Istio, and Crossplane.
- Strong mastery of Terraform: building reusable modules, designing complex infrastructure offerings operating in protected/restricted environments.
- Strong foundational knowledge of operating and scaling cloud‑native workloads using KEDA, Karpenter, NAP, etc.
- Ability to architect CI/CD infrastructure for cloud‑native workloads—primarily Golang and Python—and build Dev Sec Ops pipelines.
- Programming skills with GoLang & Python, scripting experience with bash.
- Strong knowledge of Argo CD, including controlling and scaling thousands of deployments across Kubernetes and multiple clouds.
- Deep experience in cost governance and optimization at scale, including allocation models, anomaly detection, efficiency recommendations, and guardrails across cloud and Kubernetes workloads.
- Ability to diagnose and troubleshoot complex distributed systems handling high‑volume transactions.
- Excellent written and verbal communication, able to collaborate and rally support.
- Self‑disciplined, self‑managed, self‑motivated and strong sense of ownership, urgency, and drive.
- Strong communication skills and the ability to partner across platform, security, and application engineering teams.
The Team
Ou…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).