Azure SRE: Reliability, Observability & Incident Leadership
Job in
Deerfield, Lake County, Illinois, 60063, USA
Listed on 2026-05-28
Listing for:
Veriipro
Full Time
position Listed on 2026-05-28
Job specializations:
-
IT/Tech
Cloud Computing, SRE/Site Reliability, Systems Engineer, IT Support
Job Description & How to Apply Below
We are looking for an experienced Site Reliability Engineer (SRE) to ensure the reliability, availability, and performance of Azure-based services in a large-scale enterprise environment. This role involves managing cloud infrastructure, enhancing observability, implementing disaster recovery strategies, and driving reliability improvements through SLOs/SLIs and automation.
Key Responsibilities- Define and manage SLOs, SLIs, and Error Budgets for Azure-hosted services, reporting SLA compliance to stakeholders.
- Lead architectural reviews, ensuring reliability targets (availability, RTO/RPO) are met from design to production.
- Implement chaos engineering practices and conduct disaster recovery drills across Azure regions.
- Serve as Incident Commander for P1/P2 incidents, owning the incident lifecycle and post-mortem actions.
- Design and operate enterprise observability using Azure Monitor, Log Analytics, Application Insights, and Grafana.
- Develop alerting frameworks and automate self-healing operations with Azure Automation and scripting (Python/Power Shell).
- Embed reliability gates in CI/CD pipelines and manage AKS cluster reliability (scaling, upgrades, security).
- Enforce infrastructure-as-code best practices with Terraform/Bicep for Azure Landing Zones.
- 7+ years in SRE, platform engineering, or cloud infrastructure in large-scale environments.
- 4+ years of hands-on Azure experience with AKS and cloud engineering.
- Expertise in Terraform (required), Bicep, and managing Azure Landing Zones.
- Proficiency in Python, Go, or Power Shell scripting.
- Experience with Azure observability tools (Monitor, Log Analytics, Application Insights).
- Proven track record of owning SLOs/SLIs and improving production reliability.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×