Site Reliability Engineer
Listed on 2026-06-30
-
IT/Tech
SRE/Site Reliability, Systems Engineer, Cloud Computing: Infrastructure & Operations
Looking for local candidates
Want to work in technology in the financial industry?
Our client is seeking a highly motivated Site Reliability Engineer responsible for ensuring reliability, scalability, and performance of large-scale systems and applications. The role blends software engineering, infrastructure engineering, and production support, with a strong focus on automation and observability.
Key Responsibilities Reliability & Production Ownership- Define and track service reliability goals (SLIs/SLOs) across applications
- Ensure high availability, scalability, and performance of systems
- Own production issues end-to-end and ensure problems do not recur
- Design monitoring, logging, and tracing systems (dashboards, alerts)
- Enhance operational visibility into platform performance
- Evaluate and improve monitoring coverage for new releases
- Automate manual operational tasks and workflows
- Build tools/software to reduce “toil” and improve efficiency
- Implement CI/CD pipelines and automation frameworks
- Participate in major incident triage and troubleshooting
- Identify and resolve root causes of complex outages
- Collaborate with problem management teams to prevent recurrence
- Work closely with software engineering, infrastructure, and architecture teams
- Influence adoption of reliable design patterns and best practices
- Drive early integration of non-functional requirements (reliability, scalability)
- Identify bottlenecks, capacity constraints, and vulnerabilities
- Optimize system performance and cost efficiency
- Plan for growth and scaling needs
- ~10–15+ years in SRE, software engineering, or infrastructure engineering
- Strong experience with cloud platforms (AWS/Azure)
- Proven experience supporting large-scale distributed systems
- Programming:
Python, Java, or .NET - Dev Ops: CI/CD tools (Jenkins, Git), Git Ops
- Observability:
Splunk, Prometheus, Grafana, Dynatrace - Systems:
Linux/Unix, networking, load balancing, DNS - Service Level Indicators (SLIs) & Objectives (SLOs)
- Error budgets and reliability engineering practices
- Incident response and resiliency engineering
- Strong collaboration and stakeholder management
- Ability to lead initiatives and influence engineering culture
- Problem-solving in high-pressure production environments
- Base pay rate: $140, USD
This pay rate represents mthree's good faith and reasonable estimate of the base pay for this role at the time of posting based on the locations listed in the job advertisement. It is anticipated that qualified candidates selected for a placement will receive this pay rate as a starting salary once onsite with the mthree client, however, the ultimate salary offered may be higher or lower and will be set based on a variety of non-discriminatory factors, including but not limited to geographic location, skills, and competencies.
Applicants must be currently authorized to work in the United States on a full-time basis. The Company will not sponsor applicants for work visas.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).