More jobs:
Site Reliability Engineer Managing Director
Job in
Charlotte, Mecklenburg County, North Carolina, 28245, USA
Listed on 2026-01-04
Listing for:
Gravity IT Resources
Full Time, Part Time
position Listed on 2026-01-04
Job specializations:
-
IT/Tech
Cloud Computing, IT Project Manager
Job Description & How to Apply Below
Site Reliability Engineer Managing Director
Location:
Charlotte, NC (onsite) 3 days a week hybrid
Employment type:
Full-time
Seniority level:
Director
Base pay range: $/yr - $/yr
Additional compensation:
Annual Bonus
Job Title:
SRE Manager
Direct Hire
Key Responsibilities- Lead the expansion of SRE practices from a small and high performing team to a larger global function incorporating on‑premise infrastructure technologies.
- Evaluate current operational workflows and RACIs, identify toil and complete assessment of skills across the global team.
- Execute a comprehensive roadmap to transition reactive operational day‑to‑day activities into proactive, SRE‑aligned processes with a focus on reliability, automation, observability, and incident management.
- Upskill team members through tailored training programs on SRE principles, cloud operations and automation tools.
- Collaborate with architects, platform engineering, Service Now developers and application teams to define and implement an observability framework in order to enhance proactive incident detection and reduce MTTR.
- Define and implement an automation framework to ensure sustainable, responsible, and effective use of automation to reduce toil and risk.
- Define and regularly review SLIs, SLOs, SLAs, error budgets, and incident response processes.
- Oversee recruitment, orientation, and professional development of the global SRE team.
- Foster a high‑performing team culture.
- Build strong relationships with internal and external stakeholders.
- Prepare and present reports on operational performance.
- Oversee incident response and post‑incident analysis processes and drive a culture of blameless post‑mortems across multiple teams.
Proven experience in building and leading Operational and Engineering teams.
- Adept at fostering collaboration between SRE and application development teams to drive operational excellence, reduce downtime, and help application teams accelerate delivery cycles.
- Have defined and monitored SRE principles including SLIs, SLOs, SLAs, error budgets, and incident response strategies.
- Has overseen incident response processes, skilled in post‑incident analysis and conducting blameless post‑mortems with multiple teams, driving proactive measures to prevent future incidents.
- Experience of spearheading automation initiatives using Terraform, and significantly reducing infrastructure provisioning time.
- Experience of Monitoring & Observability tools such as Logic Monitor, Azure Monitor, Prometheus, Grafana, Dynatrace and Splunk.
- Experience with Service Now and Azure Dev Ops and solid understanding of Agile, ITIL and ITSM frameworks.
- Strong expertise in Azure technologies. Experience with other CSPs highly beneficial.
- Proficiency in IaC tools including Terraform.
- Experience with SharePoint administration highly beneficial.
- Experience with container orchestration.
- Strong scripting or programming skills (e.g., Python, Power Shell).
- Experience in managing other managers highly beneficial.
- Pension plan
- Medical insurance
- Vision insurance
- 401(k)
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×