×
Register Here to Apply for Jobs or Post Jobs. X

Lead Director, Site Reliability Engineering - Client

Job in Richardson, Dallas County, Texas, 75080, USA
Listing for: 9025 CVS Shared Services Resources LLC
Full Time position
Listed on 2026-06-03
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Systems Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Lead Director, Site Reliability Engineering - Client Experience

Position Summary

The Lead Director – Site Reliability Engineering - Client Experience is responsible for building, leading, and scaling hands‑on SRE teams supporting Adjudication and Client Experience platforms across On-Prem, Azure and GCP. This role owns end‑to‑end reliability engineering—from defining SLOs and error budgets to designing resilient cloud architectures, automating operations, and embedding reliability directly into the SDLC. The ideal candidate is a deeply technical leader who has personally designed, operated, and scaled highly available distributed systems and can coach teams to do the same.

You will work closely with engineering, architecture, product, infrastructure, and security teams to shift operations from reactive to predictive, reduce operational toil, and ensure platform stability at enterprise scale.

Key Responsibilities
  • Lead and grow hands‑on SRE teams responsible for reliability, scalability, performance, and availability of Tier‑1 services across Azure and GCP
  • Establish and enforce SRE best practices, including SLIs, SLOs, error budgets, toil reduction, and automation‑first operations
  • Review and influence architecture, reliability designs, and failure modes for critical platforms and services
  • Drive cloud‑native reliability patterns, including autoscaling, graceful degradation, resilience testing, and disaster recovery
  • Own incident management, serving as an escalation leader and championing blameless post‑mortems and systemic fixes
  • Lead root cause analysis and ensure corrective actions result in measurable reliability improvements
  • Define and standardize monitoring, alerting, and observability across distributed systems using metrics, logs, and traces
  • Implement predictive operations and AI‑Ops capabilities, including anomaly detection, automated triage, and remediation
  • Lead reliability engineering for multi‑cloud environments (Azure & GCP), including Kubernetes platforms (AKS, GKE)
  • Ensure pre‑season readiness and year‑round capacity planning based on historical usage and growth forecasts
  • Drive consistency in CI/CD, deployment strategies, and rollback mechanisms across teams
  • Embed reliability into the SDLC, shifting accountability left into design, development, and testing
  • Reduce operational toil through automation, self‑service platforms, and standardized runbooks
  • Lead modernization initiatives that replace manual operations with engineering‑driven reliability solutions
  • Communicate platform health, risks, and improvements using data‑driven reliability metrics
  • Ensure systems meet security, compliance, and regulatory requirements
Required Qualifications
  • 10+ years of progressive experience in engineering or SRE organizations
  • 5+ years of experience managing senior engineers and leaders
  • 5+ years of hands‑on experience designing, deploying, and operating systems in cloud environments (Azure and/or GCP)
  • Proven experience building or scaling SRE practices, including SLOs, SLIs, incident response, and post‑mortems
  • Strong background in distributed systems, microservices, APIs, and cloud‑native architectures
  • Experience leading teams through platform modernization or reliability transformation initiatives
Preferred Qualifications
  • Deep expertise with Kubernetes‑based platforms (AKS, GKE; Open Shift a plus)
  • Experience implementing AI‑Ops, automation, and predictive reliability solutions
  • Strong understanding of observability platforms and modern monitoring strategies
  • Track record of reducing outages, improving MTTR, and scaling reliability at enterprise scale
  • Ability to operate with a startup mindset while navigating complex enterprise environments
  • Excellent communication and stakeholder management skills with the ability to influence at all levels
Education
  • Bachelor’s degree or equivalent experience
Pay Range

The typical pay range for this role is: $ - $. This pay range represents the base hourly rate or base annual full‑time salary for all positions in the job grade within which this position falls. The actual base salary offer will depend on a variety of factors including experience, education, geography and other relevant factors. This position is eligible for a CVS Health bonus, commission or…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary