Principal Cloud Infrastructure Engineer; Advanced Threat Protection Job Santa Clara area,California USA,IT/Tech

Position: Principal Cloud Infrastructure Engineer (Advanced Threat Protection)

Job Summary:

Palo Alto Networks is at the forefront of cloud-native infrastructure, where reliability, scale, and intelligent automation define the future of operations. As a Senior Site Reliability Engineer, you will design and operate the platforms that power our applications across GCP, AWS, and global data centers - and you'll push the boundary of what's possible by leveraging AI and machine learning to transform how we approach SRE.

This isn't just about keeping the lights on. You'll build intelligent systems that predict incidents before they happen, automate root cause analysis, and continuously optimize our infrastructure. You'll be a critical bridge between engineering and our Infrastructure Platform, combining deep SRE expertise with AI-driven automation to deliver unprecedented levels of reliability and operational efficiency.

If you're excited about applying AI to real-world infrastructure challenges - and you thrive in an environment where automation isn't just a nice-to-have but a core philosophy - this is your next career.

Your Impact

Design, build, and operate cloud infrastructure that enables reliable, rapid deployment of microservices with resilient operations and effective monitoring
Leverage AI/ML to automate incident detection, root cause analysis, and remediation - reducing toil and accelerating mean time to resolution
Build and integrate AI-powered tools (LLM-based agents, AIOps platforms) into SRE workflows for intelligent alerting, log analysis, and capacity planning
Write automation code for provisioning and operating infrastructure at massive scale
Develop self-healing systems that can automatically detect anomalies, diagnose issues, and take corrective action with minimal human intervention
Work with development teams to ensure applications are production-ready, scalable, and reliable from the ground up
Identify and drive opportunities to improve automation for code deployment, management, and observability of application services
Establish end-to-end monitoring and alerting on all critical components, incorporating AI-driven anomaly detection and predictive analytics
Participate in the on-call rotation supporting the platform and production applications
Lead root cause analysis of critical business and production issues, building runbooks and automation to prevent recurrence
Mentor other SREs on best practices in infrastructure orchestration, production troubleshooting, and AI-augmented operations
Represent SRE in design reviews and work cross-functionally with engineering teams on operational readiness

Qualifications

7+ years of experience in Dev Ops, Site Reliability, or infrastructure engineering
Expertise in multi-cloud environments - strong hands-on experience with GCP, AWS, and familiarity with OCI (Oracle Cloud Infrastructure)
Experience designing and operating infrastructure across multiple cloud providers, including networking, identity management, and cross-cloud connectivity
Expertise in Infrastructure as Code with tools such as Terraform, Ansible
Strong proficiency in Python and shell scripting for automation
Strong experience with Linux and distributed systems handling high-volume transactions
Familiarity with CI/CD pipelines, Git Lab, and Artifactory
Strong fundamentals in HTTP, web servers, and networking
BS or MS in Computer Science, a related field, or equivalent professional experience
Excellent problem solving, critical thinking, communication, and teamwork skills
Self-disciplined, self-managed, self-motivated with a strong sense of ownership, urgency, and drive
Experience applying AI/ML to operational workflows (AIOps, intelligent alerting, automated remediation, or LLM-powered tooling) is a strong plus
Experience with cloud compliance frameworks (FedRAMP, IL5) and operating in regulated environments is a plus
Experience building and managing large database systems - relational (MySQL, PostgreSQL) and non-relational (Redis, Big Query, etc.) - is a plus

Compensation Disclosure

The compensation offered for this position will depend on qualifications, experience, and work location. For candidates who receive an offer at the posted level, the starting base salary (for…