Site Reliability Engineer; SRE
Listed on 2026-06-13
-
IT/Tech
SRE/Site Reliability, Systems Engineer, Cloud Computing: Infrastructure & Operations
Site Reliability Engineer (SRE)
Chandler, AZ
$60-$70/hour
Hybrid 3 days onsite 2 days remote
18 Month W2 Contract
The individual in this role is responsible for directly partnering with Application Development and Production Support teams to implement the measures prescribed through the collaboration of the Site Reliability Engineer (SRE) Lead or Senior SRE and their partners. This individual will ensure the appropriate instrumentation, tooling, ticketing, alerting and on-call routines are in place for key services. This role will be engaged in production triage efforts and work with Problem Management in the identification of root cause for issues as required, using the knowledge gained in those efforts to partner closely with the Senior SRE to address any gaps in the reliability measurements and dashboards.
This role will also focus heavily on software development activities, with a focus toward delivering automated solutions to eliminate ‘toil’ and suggest code enhancements to the Application Development teams.
- Collaborate with Development and Infrastructure teams to understand technical solutions and to implement the monitoring capabilities outlined in the application and system monitoring designs put forward by the SRE Lead.
- Mentor SRE resources on reliability practices and established tools/capabilities.
- Develop and maintain a catalog of extensible reliability scripts, tools and libraries that can be leveraged for common instrumentation, automation, and operational needs.
- Partner to implement code changes to make use of common reliability libraries and tools and help Application Production Services (APS) and Application Development teammates understand how to use them.
- Engage as a subject matter expert (SME) in major incident triage efforts, failure scenario modelling and work with Problem Manager to diagnose root causes for major incident / problem management investigations.
- Identify vulnerabilities and opportunities for reliability improvement, such as investigating low level error rates and 'noise' in monitoring, and help define solutions to reduce manual support effort and/or improve system reliability. Participate regularly in an on-call rotation with Production Support teammates to learn more about reliability issues affecting their portfolio.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).