×
Register Here to Apply for Jobs or Post Jobs. X

Lead Site Reliability Engineer

Job in Eden Prairie, Hennepin County, Minnesota, 55344, USA
Listing for: Eightelevengroup
Full Time position
Listed on 2026-03-02
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 60 USD Hourly USD 60.00 HOUR
Job Description & How to Apply Below

Lead Site Reliability Engineer
Eden Prairie, MN;
Arizona; or Telecommute
Hybrid role
Compensation: $60

ABOUT

THE ROLE

Our Client is seeking a Lead Site Reliability Engineer to serve as the technical anchor for a team dedicated to ensuring the stability, performance, and resiliency of our most critical applications. In this pivotal role, you will provide technical leadership on system architecture, workload placement, and transaction optimization. You will architect and implement solutions to enhance system reliability, drive incident management maturity, and lead efforts to eliminate single points of failure across both on-premises and cloud environments.

The successful candidate will mentor a team of engineers, optimize applications for performance and cost, and establish robust monitoring and observability practices. You will also play a key role in enhancing failover capabilities and ensuring services are designed for zonal and regional resiliency.

WHAT YOU'LL DO
  • Provide technical leadership and mentorship to engineers on Site Reliability Engineering (SRE) best practices.
  • Architect and implement solutions to improve system reliability and eliminate single points of failure for critical applications, including technologies such as Azure Front Door and Cloudflare.
  • Drive incident management maturity by reducing Mean Time to Recover (MTTR) to 60 minutes or less and conducting deep root cause analysis on P1/P2 incidents.
  • Lead the development and build-out of proactive monitoring solutions, including Business Journey Maps, using observability tools such as Dynatrace.
  • Establish and lead ongoing architectural review processes to ensure zonal and regional resiliency for cloud-native applications.
  • Partner with development teams to optimize applications for performance, reliability, and cost in the cloud.
  • Make direct technical adjustments to improve system stability and guide the team in eliminating single points of failure.
WHAT YOU BRING
  • 8+ years of experience in technical roles such as Software Engineering, Systems Engineering, or Dev Ops.
  • 3+ years of dedicated experience as a Site Reliability Engineer.
  • Deep expertise with at least one major cloud platform (Azure, AWS, GCP).
  • Proven experience with observability and monitoring tools (e.g., Dynatrace, Prometheus, Grafana).
  • Strong understanding of networking, distributed systems, and infrastructure-as-code (Terraform, Ansible).
  • Experience as a technical lead or mentor.
  • Proficiency in one or more programming languages (e.g., Python, Go, Java).
  • Experience in a full-stack engineering capacity.
  • Knowledge of containerization and orchestration (Docker, Kubernetes).
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary