×
Register Here to Apply for Jobs or Post Jobs. X

Principal Engineer Software

Job in Santa Clara, Santa Clara County, California, 95054, USA
Listing for: Palo Alto Networks
Full Time position
Listed on 2026-07-01
Job specializations:
  • IT/Tech
    Systems Engineer
Job Description & How to Apply Below
** Our Mission*
* At Palo Alto Networks®, we're united by a shared mission-to protect our digital way of life. We thrive at the intersection of innovation and impact, solving real-world problems with cutting-edge technology and bold thinking. Here, everyone has a voice, and every idea counts. If you're ready to do the most meaningful work of your career alongside people who are just as passionate as you are, you're in the right place.

** Who We Are*
* In order to be the cybersecurity partner of choice, we must trailblaze the path and shape the future of our industry. This is something our employees work at each day and is defined by our values:
Disruption, Collaboration, Execution, Integrity, and Inclusion. We weave AI into the fabric of everything we do and use it to augment the impact every individual can have. If you are passionate about solving real-world problems and ideating beside the best and the brightest, we invite you to join us!

We believe collaboration thrives in person. That's why most of our teams work from the office full time, with flexibility when it's needed. This model supports real-time problem-solving, stronger relationships, and the kind of precision that drives great outcomes.

** Job Summary*
* ** Sr Staff Data Center & Open Shift Operations Engineer*
* ** Position Overview*
* ** The*
* ** Senior Data Center Operations Engineer*
* ** is responsible for the bedrock of our high-availability infrastructure. This role bridges the gap between physical hardware and the*
* ** Red Hat Open Shift Container Platform (OCP)**  **. Your mission is to ensure*
* ** 99.99% availability*
* ** by architecting resilient physical layouts and automating the deployment, scaling, and self-healing capabilities of our production clusters.*
* ** Key Responsibilities*
* +  
** High-Availability (HA)

Infrastructure: *
* ** Monitor and maintain data center systems with a focus on "Zero Single Point of Failure" (ZSPoF) architecture for Open Shift control planes and worker nodes.*
* +  
** Cluster Reliability Engineering:*
* ** Implement and manage Open Shift 4.x clusters across multiple power and cooling zones to ensure 99.99% uptime.*
* +  
** Disaster Recovery & Business Continuity:*
* ** Design, test, and execute automated failover strategies and backup/restore procedures using tools like*
* ** OADP (Velero)*
* ** and*
* ** Red Hat ACM**  **.*
* +  
** Automated Maintenance:*
* ** Perform routine maintenance and upgrades using*
* ** Git Ops (ArgoCD)*
* ** and the*
* ** Machine Config Operator*
* ** to ensure zero-downtime node evacuations and patching.*
* +  
** Complex Troubleshooting:*
* ** Resolve deep-stack hardware and software issues, from faulty GPU firmware to Open Shift SDN (OVN-Kubernetes) network latencies.*
* +  
** Vendor & Lifecycle Management:*
* ** Coordinate with vendors for specialized hardware (e.g., NVIDIA, Dell, Cisco) while maintaining strict security and firmware compliance.*
* +  
** Efficiency & Capacity Architecture:*
* ** Optimize rack density for high-performance GPU clusters while managing thermal loads and power distribution (PDU) to prevent circuit-trip outages.*
* +  
** Observability Implementation:*
* ** Maintain accurate documentation and integrate hardware health metrics (IPMI/SNMP) into*
* ** Prometheus/Grafana*
* ** for proactive alerting.*
* +  
** Physical Deployment:*
* ** Rack and stack high-density GPU servers, ensuring redundant power-pathing and high-speed (100G/200G) Infini Band or Ethernet cabling.*
* +  
** Hardware Lifecycle:*
* ** Perform precision physical installation and replacement of critical components (CPUs, GPUs, NVMe storage) in a live production environment without impacting cluster quorum.*
* ** Qualifications*
* +  
*
* Education:

*
* ** Bachelor's degree in Computer Science, IT, or equivalent experience.*
* +  
** Platform Expertise:*
* ** 5+ years of experience specifically operating*
* ** Red Hat Open Shift (OCP)*
* ** in a production environment.*
* +  
** Hardware Fluency:*
* ** Deep experience racking/stacking and cabling high-density GPU systems (e.g., NVIDIA DGX or similar) and specialized AI/ML hardware.*
* +  
** Infrastructure as Code (IaC):*
* ** Advanced proficiency in*
* ** Ansible*
* **…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary