Senior Director - Reliability Operations
Listed on 2026-06-28
-
IT/Tech
IT Project Manager, SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, Systems Engineer
Senior Director
- Reliability Operations
The Senior Director
- Reliability Operations, is a strategic leader accountable for ensuring the reliability, availability, and performance of the enterprise technology ecosystem. This role oversees all ITIL-based service management functions, Site Reliability Engineering (SRE), the Service Now Platform, Mission Control, and Live Sight Insights. This leader drives operational excellence through a proactive reliability strategy that combines process discipline, automation, observability, and real-time insights.
They will partner closely with engineering, infrastructure, cybersecurity, and product teams to build and sustain resilient systems that power Gap Inc.'s digital and in-store experiences. As a thought leader, the Sr. Director will shape the long-term vision for operational reliability and service management—defining modern capabilities, optimizing service performance, and establishing an innovation-driven reliability culture.
The responsibilities include:
Strategic Leadership & Vision
- Define and execute the enterprise Reliability Operations strategy, ensuring alignment with business objectives and technology roadmaps.
- Lead transformation of ITIL functions into agile, data-driven service management capabilities across incident, problem, change, and configuration management.
- Partner with senior technology and business leaders to embed reliability and performance metrics into product development and operational planning.
Operational Excellence & Reliability Engineering
- Lead Site Reliability Engineering (SRE) practices across platforms and services—driving automation, self-healing capabilities, and proactive monitoring to achieve measurable service resiliency improvements.
- Establish standards for availability, latency, scalability, and operational efficiency through engineering-driven reliability principles.
- Champion reliability by design—ensuring observability, capacity planning, and chaos testing are core to delivery processes.
Mission Control & Live Sight Insights
- Oversee the Mission Control organization responsible for real-time system monitoring, incident command, and critical event management.
- Drive adoption of Live Sight Insights to create predictive and actionable intelligence on service health and performance trends.
- Enable enterprise visibility of key metrics through intuitive dashboards and business-impact-based alerting models.
Service Now Governance Ownership
- Own the Service Now Platform governance strategy and roadmap, ensuring it enables ITIL process excellence, automation, and cross-enterprise workflow integration.
- Collaborate with product and engineering teams to provide industry best practices for Service Now's capabilities including IT, HR, Security, and Enterprise Operations.
- Lead a platform governance mindset—focusing on reliability, scalability, and ease of use.
People Leadership & Culture
- Build, inspire, and develop a high-performing global Reliability Operations team that embodies accountability, collaboration, and innovation.
- Foster a culture of data-driven decision making, continuous learning, and operational excellence.
- Serve as a mentor and coach to emerging leaders—raising the organizational bar for reliability engineering and service leadership.
Cross-Functional Partnership
- Work closely with Software Engineering, Infrastructure, Cybersecurity, and Business Technology teams to ensure reliability objectives are integrated end-to-end.
- Partner with Enterprise Architecture and Program Management to align technology investments with reliability outcomes.
- Act as a trusted advisor to executive leadership on reliability strategy, risk posture, and performance health of the enterprise environment.
Who you are:
- Proven strategic leader with success driving operational transformation at scale in global, complex environments for more than 10 years.
- Deep expertise in ITIL frameworks, SRE principles, Service Now platform administration and architecture, and modern observability practices.
- Strong technical understanding across infrastructure, cloud operations, automation, and service management ecosystems.
- Exceptional ability to influence at all levels—translating…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).