×
Register Here to Apply for Jobs or Post Jobs. X

Site reliability Lead Engineer

Job in Fort Mill, York County, South Carolina, 29715, USA
Listing for: International Solutions Group
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Summary

A senior technical leader responsible for owning a reliability strategy, leading an SRE team, and ensuring the operational health, scalability, and availability of services. Combines hands-on engineering, automation, and people leadership to drive reliability across the organization.

Core responsibilities

Strategy & process
  • Define SRE strategy, process frameworks, standards, and best practices.
  • Establish SLIs, SLOs, and error budget policies; embed reliability into the SDLC.
  • Promote a culture of service ownership and maintain strong cross-team feedback loops.
Reliability & capacity
  • Oversee monitoring and maintenance to meet SLAs and uptime targets.
  • Drive capacity planning and forecasting to ensure performance at scale.
  • Use data and metrics to prioritize reliability investments and tradeoffs.
Automation & tooling
  • Lead automation efforts to eliminate operational toil and streamline runbooks.
  • Oversee Infrastructure as Code practices (for example Terraform, Cloud Formation) and configuration management.
  • Improve CI/CD pipelines to enable safer, faster releases.
Incident & change management
  • Lead incident response and communications during outages.
  • Conduct blameless postmortems and ensure corrective actions are executed.
  • Govern change control to ensure safe, tested production deployments.
Collaboration & communication
  • Partner with engineering, architecture, and product teams to bake reliability into designs and roadmaps.
  • Translate technical issues and tradeoffs for technical and nontechnical stakeholders.
Team leadership
  • Hire, mentor, and develop SRE engineers; set team goals and a roadmap.
  • Lead calmly and effectively under pressure during critical incidents and drive customer focused decisions.
Qualifications & skills

Technical

  • Proven SRE/Dev Ops/infrastructure experience (typically 6 years) with leadership experience (about 2 3 years).
  • Strong cloud experience (AWS preferred), containerization (Docker), and orchestration (Kubernetes).
  • Expertise with IaC and automation tools (Terraform, Cloud Formation, Ansible, or similar).
  • Proficient in scripting and programming for automation (Python, Bash, or similar).
  • Deep experience with monitoring and observability tooling (Prometheus, Grafana, ELK/ELK Stack, Splunk, Datadog, etc.).
Leadership & soft skills
  • Strong people leadership and coaching skills with proven stakeholder communication.
  • Excellent problem solving, analytical thinking, and adaptability.
  • Strategic mindset balancing engineering excellence with business priorities.
Deliverables
  • A measurable reliability roadmap aligned to business goals.
  • Reduced operational toil through automation and improved runbooks.
  • Clear SLIs, SLOs and established error budget governance.
  • A high performing SRE team with documented processes for incident and change management.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary