Lead Site Reliability Engineer
Listed on 2026-05-31
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing
Lead Site Reliability Engineer (SRE)
This role owns reliability outcomes for a modern split‑plane, multi‑region SaaS platform serving enterprise customers. Focuses on system design, reliability strategy, and cross‑team execution. Direct impact on SLO attainment, MTTR reduction, cost efficiency, and overall service health.
Responsibilities- Define and drive reliability strategy across control‑plane and data‑plane systems, including multi‑region resilience, BCDR, and failover design.
- Establish and operationalize SLOs, SLAs, and error budgets, ensuring they inform planning and engineering tradeoffs.
- Lead initiatives that measurably improve MTTR, incident prevention, and overall service health.
- Own incident management end‑to‑end, driving systemic fixes and long‑term reliability improvements beyond immediate response.
- Lead architecture and design reviews to ensure systems meet scalability, reliability, and cost‑efficiency goals.
- Champion automation and modernization, including AI‑driven reliability improvements.
- Establish and enforce code quality and review standards.
- Lead cross‑functional initiatives and align engineering with product priorities.
- Mentor senior engineers and act as a technical leader across teams.
- 6+ years leading delivery of complex, distributed systems or SaaS platforms.
- Strong experience with multi‑region, split‑plane architectures (control‑plane / data‑plane).
- Proven track record improving SLOs, MTTR, and system reliability at scale.
- Proficiency in languages such as Python, Java, C++ or JavaScript.
- Deep experience with Kubernetes (multi‑cluster), CI/CD, and Git Ops (ArgoCD); SLO/SLA design; observability; incident management; infrastructure as code and cloud platforms; disaster recovery, resilience, and security best practices.
- Strong leadership skills with experience mentoring senior engineers and influencing cross‑team decisions.
- Experience with chaos engineering and large‑scale reliability automation.
- Background in enterprise SaaS platforms or split‑plane architectures.
- Expertise in navigating, understanding, and leveraging modern observability platforms (Datadog, Grafana, etc.).
The salary range for this role in the United States is $136,000 – $177,000. Employees may also be eligible for a wide range of other benefits, including bonuses, commissions, medical, retirement, financial, wellness, time off, employee discounts, and more.
Benefits & PerksAlteryx offers a comprehensive benefits package, viewable on the company site. The role follows relevant fair‑chance ordinances and will comply with U.S. export controls.
Equal Employment Opportunity StatementAlteryx, Inc. is an Equal Employment Opportunity Employer.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).