×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer Lead

Job in Cincinnati, Hamilton County, Ohio, 45208, USA
Listing for: TekLeaders, Inc
Full Time position
Listed on 2026-01-02
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Job Description & How to Apply Below

Role Name: Site Reliability Engineer – Lead

Cincinnati, OH – Hybrid only on w2

Role

Description:

As a Site Reliability Engineer – Lead, you will drive the reliability, scalability, and performance of mission-critical systems and services while leading a team of SREs. This role combines deep technical expertise with leadership, mentoring, and strategic planning. You will set standards for operational excellence, guide incident response, and foster a culture of automation and continuous improvement. Collaboration with engineering, operations, and product teams is essential to align reliability initiatives with business objectives and ensure seamless service delivery.

REQUIRED SKILL:

  • Proven experience in site reliability, Dev Ops, or systems engineering, with prior leadership or team lead responsibilities

  • Strong programming/scripting skills (e.g., Python, Go, Bash, or similar)

  • Deep expertise in Linux/Unix system administration and networking

  • Experience architecting and operating cloud platforms (AWS, Azure, Google Cloud Platform)

  • Proficiency with infrastructure-as-code and automation tools (e.g., Terraform, Ansible, Cloud Formation)

  • Advanced knowledge of monitoring, logging, and alerting solutions (e.g., Prometheus, Grafana, ELK, Datadog)

  • Demonstrated incident management and root cause analysis skills

  • Experience designing and implementing CI/CD pipelines

  • Strong understanding of containerization and orchestration (Docker, Kubernetes)

  • Ability to define and enforce reliability, scalability, and security best practices

  • Excellent communication, stakeholder management, and collaboration skills

  • Experience mentoring, coaching, and developing SRE or engineering teams

  • Strong hands-on knowledge to define business process dashboards in APM tools like dynatrace with SLA, ALO and SLI definition, design and implementation as part of observability.

  • Experience with devices like Scanner, POS Devices, Peripheral devices (includes On device memory based devices)

  • Experience with Hardcoded protocols and software for devices and should be able to decode and run them and help integrate with other modules.

  • Experience in Edge computing, Google Distributed Cloud and Hybrid cloud environments.

  • Experience leading SRE teams in high-growth or regulated environments

  • Advanced database administration and optimization skills(both SQL e.g. MYSQL and No SQL e.g. Mongo DB databases)

Key Responsibilities:

  • Team Leadership & Development:

  • Technical expertise, hands on experience with ability to lead the development team.

  • Should be able to mentor team members and guide on the right approach for SRE related work.

  • Foster a culture of operational excellence, automation, and continuous learning

  • Conduct regular team meetings, 1:1s, and performance reviews

  • Reliability Strategy & Architecture:

  • Define and implement reliability, scalability, and performance strategies for critical systems

  • Set standards for monitoring, alerting, and incident response

  • Guide architectural decisions to ensure robust, resilient infrastructure

  • Incident & Problem Management:

  • Oversee incident response, root cause analysis, and post-mortem processes

  • Coordinate with cross-functional teams to resolve complex issues and prevent recurrence

  • Drive improvements based on incident learnings

  • Process Improvement & Automation:

  • Identify and eliminate manual operational tasks through automation

  • Optimize CI/CD pipelines and deployment processes

  • Continuously enhance system reliability and efficiency

  • Stakeholder

    Collaboration:

  • Partner with engineering, operations, and product teams to align reliability goals with business objectives

  • Communicate reliability metrics, risks, and progress to leadership and stakeholders

  • Security & Compliance:

  • Ensure infrastructure and processes adhere to security best practices and compliance requirements

  • Experience in handling chaos and resilience

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary