Site Reliability Engineer Managing Director Job Charlotte area,North Carolina USA,IT/Tech

Site Reliability Engineer Managing Director

Location:

Charlotte, NC (onsite) 3 days a week hybrid

Employment type:

Full-time

Seniority level:
Director

Base pay range: $/yr - $/yr

Additional compensation:
Annual Bonus

Job Title:

SRE Manager

Direct Hire

Key Responsibilities

Lead the expansion of SRE practices from a small and high performing team to a larger global function incorporating on‑premise infrastructure technologies.
Evaluate current operational workflows and RACIs, identify toil and complete assessment of skills across the global team.
Execute a comprehensive roadmap to transition reactive operational day‑to‑day activities into proactive, SRE‑aligned processes with a focus on reliability, automation, observability, and incident management.
Upskill team members through tailored training programs on SRE principles, cloud operations and automation tools.
Collaborate with architects, platform engineering, Service Now developers and application teams to define and implement an observability framework in order to enhance proactive incident detection and reduce MTTR.
Define and implement an automation framework to ensure sustainable, responsible, and effective use of automation to reduce toil and risk.
Define and regularly review SLIs, SLOs, SLAs, error budgets, and incident response processes.
Oversee recruitment, orientation, and professional development of the global SRE team.
Foster a high‑performing team culture.
Build strong relationships with internal and external stakeholders.
Prepare and present reports on operational performance.
Oversee incident response and post‑incident analysis processes and drive a culture of blameless post‑mortems across multiple teams.

Key Requirements

Proven experience in building and leading Operational and Engineering teams.

Adept at fostering collaboration between SRE and application development teams to drive operational excellence, reduce downtime, and help application teams accelerate delivery cycles.
Have defined and monitored SRE principles including SLIs, SLOs, SLAs, error budgets, and incident response strategies.
Has overseen incident response processes, skilled in post‑incident analysis and conducting blameless post‑mortems with multiple teams, driving proactive measures to prevent future incidents.
Experience of spearheading automation initiatives using Terraform, and significantly reducing infrastructure provisioning time.
Experience of Monitoring & Observability tools such as Logic Monitor, Azure Monitor, Prometheus, Grafana, Dynatrace and Splunk.
Experience with Service Now and Azure Dev Ops and solid understanding of Agile, ITIL and ITSM frameworks.
Strong expertise in Azure technologies. Experience with other CSPs highly beneficial.
Proficiency in IaC tools including Terraform.
Experience with SharePoint administration highly beneficial.
Experience with container orchestration.
Strong scripting or programming skills (e.g., Python, Power Shell).
Experience in managing other managers highly beneficial.

Benefits

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language