Site Reliability Engineer
Job in
Deerfield, Lake County, Illinois, 60015, USA
Listed on 2026-06-02
Listing for:
3B Staffing
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Job Description & How to Apply Below
Greetings!!!
This is Harsh from Jconnect INC
. Below is the requirement with my client. Please share the updated resume and below details.
Title:
Site Reliability Engineer
Location:
Deerfield IL (Onsite)
Duration:
Full-time only
JOB DESCRIPTION
:
Must Have Technical/Functional Skills
- 7+ years of experience in SRE, platform engineering, or cloud infrastructure engineering in large-scale enterprise environments (10,000+ employees or equivalent complexity).
- Deep, hands-on expertise with Microsoft Azure - minimum 4 years in a primary Azure cloud engineering role.
- Expert-level proficiency with AKS: cluster lifecycle management, RBAC, network policies, pod security standards, cluster autoscaler, and Workload Identity.
- Strong infrastructure-as-code skills:
Terraform (required) and/or Bicep; experience managing Azure Landing Zones or Enterprise-Scale architecture. - Proficiency in at least one systems programming/scripting language:
Python (preferred), Go, or Power Shell. - Experience designing and operating enterprise observability platforms using Azure Monitor, Log Analytics and Application Insights at scale.
- Demonstrable track record of owning SLOs/SLIs and delivering measurable reliability improvements in production.
- Strong knowledge of enterprise networking in Azure:
Hub-and-Spoke/Virtual WAN, Express Route, Azure Firewall, NSGs, Private Endpoints, and DNS Private Zones.
Certifications:
- AZ-104 | AZ-305 (Preferred) | AZ-400 (Preferred) | CKA | ITIL v4 Foundation
Roles & Responsibilities
Reliability & Availability Engineering
- Define, own, and enforce enterprise-wide SLOs, SLIs, and Error Budgets across all Tier-0 and Tier-1 Azure-hosted services; report SLA compliance to executive stakeholders monthly.
- Lead architectural reviews for new services and ensure reliability non-functionals (availability targets, RTO/RPO) are embedded from design through to production.
- Champion and implement chaos engineering practices using Azure Chaos Studio and custom fault injection frameworks to proactively surface reliability risks.
- Drive Disaster Recovery (DR) design and conduct quarterly DR drills across Azure paired regions. Incident Management & On-Call
- Serve as Incident Commander for P1/P2 major incidents, own end-to-end incident lifecycle from detection through resolution and Post-Incident Review (PIR).
- Participate in a structured On-Call rotation with follow-the-sun global coverage; maintain response SLAs of
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×