Site Reliability Engineer, Consultant
Listed on 2025-11-28
-
IT/Tech
Cloud Computing, Systems Engineer
Your Role
The Technology Operations Center (TOC) team provides 24 x 7 coverage of observability monitoring events including batch operations to assure successful execution and completion of critical business services, within required timelines. The Site Reliability Engineer will report to the Manager, TOC. In this role you will be responsible for reliability, scalability, and performance of our infrastructure and applications. You will work closely with development and operations teams to automate processes, monitor systems, and respond to incidents.
Our leadership model is about developing great leaders at all levels and creating opportunities for our people to grow - personally, professionally, and financially. We are looking for leaders that are energized by creative and critical thinking, building and sustaining high-performing teams, getting results the right way, and fostering continuous learning.
- Requires a BS degree in computer science or equivalent field with 5+ years or MS degree
- Requires 7+ years experience, engineering and/or operating production systems or equivalent
- Cloud Platforms:
Azure, AWS, GCP. - Programming & Scripting
Languages:
Python, Go, Java, Bash, Power Shell or similar. - Containerization & Orchestration:
Red Hat Open Shift, Kubernetes, Docker, Helm. - Monitoring & Observability:
Prometheus, Grafana, Datadog, New Relic, ELK Stack, Dynatrace, Splunk, Big Panda, Solar Winds. - CI/CD & Configuration Management:
Jenkins, Git Hub Actions, Git Lab CI, Argo CD, Spinnaker, Ansible, Chef, Puppet. - Intelligent Automation & Agentic Systems:
Familiarity with Agentic AI systems and autonomous workflows for incident resolution, observability, and infrastructure optimization.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).