More jobs:
Senior Site Reliability Engineer - AIOps & Cloud Automation
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-05-29
Listing for:
Palo Alto Networks
Full Time
position Listed on 2026-05-29
Job specializations:
-
IT/Tech
SRE/Site Reliability, Cloud Computing
Job Description & How to Apply Below
Requirements
- BS or MS in Computer Science, a related field, or equivalent professional experience ,
- Expertise in configuration management with a framework such as Ansible, Terraform, Helm ,
- Experience in Production Engineering, Dev Ops, or Site Reliability ,
- Expertise in private or public cloud ,
- Strong Linux administration, internals, and network troubleshooting ,
- Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks ,
- Familiarity with CI/CD pipelines, Git Lab and Git Hub preferred ,
- Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions ,
- Excellent written and verbal communication, able to collaborate and rally support ,
- Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive ,
- Passion for infrastructure and monitoring as code ,
- Ready to understand and dissect new technology stacks quickly
- As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability ,
- Our stack includes Kubernetes, Docker, GCP, AWS, Ansible, Terraform, Vault, Gitlab, Spinnaker, Tensorflow, Datadog, Elasticsearch, Kafka, Hadoop, MySQL, Percona, Mongo
DB, Python, and Go. We don’t expect you to know all these, but we do expect you to learn the ones needed for this role , - Contribute to the success of SRE and Dev Ops ,
- Develop expertise in new technologies ,
- Work with developers, researchers, data scientists, and security experts ,
- Design, build and operate reliable, secure Cloud infrastructure ,
- Ensure that applications are production-ready, scalable, and reliable ,
- Develop tools and automation frameworks ,
- Automate robust deployment of robust services ,
- Orchestrate end-to-end monitoring and alerting ,
- Participate with SRE and Dev teams in the on-call rotation ,
- Lead root cause analysis of critical business and production issues ,
- Mentor and champion SRE culture ,
- Participate in design reviews
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×