Senior Director - Reliability Operations
Listed on 2026-06-16
-
IT/Tech
IT Project Manager, SRE/Site Reliability, Cloud Computing: Infrastructure & Operations
About Gap Inc.
Gap Inc. creates culture as much as it creates clothes. Our ambition is to become a high‑performing house of iconic American brands that shape culture. We bring diverse perspectives through Old Navy, Gap, Banana Republic, and Athleta, each delivering unique value to customers. Our purpose is to bridge gaps between people, perspectives, and possibilities to create a better world. We build high‑performing teams that think boldly, take ownership, and turn ideas into impact.
Aboutthe Role
The Senior Director – Reliability Operations is a strategic leader responsible for ensuring the reliability, availability, and performance of Gap Inc.’s enterprise technology ecosystem. The role oversees ITIL‑based service management, Site Reliability Engineering (SRE), the Service Now platform, Mission Control, and Live Sight Insights. It drives operational excellence through a proactive reliability strategy that combines process discipline, automation, observability, and real‑time insights while partnering with engineering, infrastructure, cybersecurity, and product teams.
WhatYou’ll Do Strategic Leadership & Vision
- Define and execute the enterprise Reliability Operations strategy, ensuring alignment with business objectives and technology roadmaps.
- Lead transformation of ITIL functions into agile, data‑driven service management capabilities across incident, problem, change, and configuration management.
- Partner with senior technology and business leaders to embed reliability and performance metrics into product development and operational planning.
- Lead Site Reliability Engineering practices across platforms and services—driving automation, self‑healing capabilities, and proactive monitoring to achieve measurable service resiliency improvements.
- Establish standards for availability, latency, scalability, and operational efficiency through engineering‑driven reliability principles.
- Champion reliability by design—ensuring observability, capacity planning, and chaos testing are core to delivery processes.
- Oversee the Mission Control organization responsible for real‑time system monitoring, incident command, and critical event management.
- Drive adoption of Live Sight Insights to create predictive and actionable intelligence on service health and performance trends.
- Enable enterprise visibility of key metrics through intuitive dashboards and business‑impact‑based alerting models.
- Own the Service Now platform governance strategy and roadmap, ensuring it enables ITIL process excellence, automation, and collaboration on cross‑enterprise workflow integration.
- Collaborate with product and engineering teams to provide industry best practices for Service Now’s capabilities across IT, HR, security, and enterprise operations.
- Lead a governance mindset focusing on reliability, scalability, and ease of use.
- Build, inspire, and develop a high‑performing global Reliability Operations team that embodies accountability, collaboration, and innovation.
- Foster a culture of data‑driven decision making, continuous learning, and operational excellence.
- Serve as a mentor and coach to emerging leaders—raising the organizational bar for reliability engineering and service leadership.
- Work closely with Software Engineering, Infrastructure, Cybersecurity, and Business Technology teams to ensure reliability objectives are integrated end‑to‑end.
- Partner with Enterprise Architecture and Program Management to align technology investments with reliability outcomes.
- Act as a trusted advisor to executive leadership on reliability strategy, risk posture, and performance health of the enterprise environment.
- Proven strategic leader with more than 10 years of experience driving operational transformation in global, complex environments.
- Deep expertise in ITIL frameworks, SRE principles, Service Now platform administration and architecture, and modern observability practices.
- Strong technical understanding across infrastructure, cloud operations, automation, and service…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).