Sr. Technical Program Manager , Prime Video Sports Linear and News
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, SRE/Site Reliability
Overview
Love sports? Want to change the world? Prime Video is re-inventing the live sports experience by merging high-quality live streaming, immersive interactive features, and exclusive access to some of the world's most loved sports properties – including the NFL, NBA, English Premier League and UEFA Champions League soccer, Rolland-Garros (French Open), the US Open and more. Every day we face the challenges of a fast-paced market and expanding technology set.
And we're building at a scale only Amazon is capable of. The Prime Video Sports team is a single-threaded-owner, uniquely positioned to execute our vision across the entire Prime Video stack to deliver the most compelling new experiences to our customers. If this sounds like the type of experience you'd like to help build – apply here and let's chat!
We are deeply connected with the idea that live sports events are fundamentally different: there is only one shot at delivering the moment until it is forever lost. Our mission is to make sure that our customers never miss a single moment of the sports they love. To do this we need to deliver best-in-class live sports infrastructure, testing, monitoring, and operations to optimize the delivery of live sports experiences team engineers for speed, quality, availability, and scale.
WhatWe're Looking For
We need a highly talented Senior Technical Program Manager to build resilient, highly available, and operationally excellent systems that power our vision. You’ll anticipate availability risks, manage critical incidents and escalations, and balance business SLA commitments against technical constraints and operational trade-offs.
You must break down complex reliability initiatives into manageable operational improvements, develop comprehensive availability specifications, and deliver measurable enhancements to system uptime and operational health. We seek someone with a strong curiosity to build and nurture an operationally mature, reliability-focused engineering culture—fostering proactive monitoring practices, blameless post-mortem disciplines, and integrating emerging observability and chaos engineering technologies into our operational excellence framework.
WhatYou Will Do Here
- Manage complex, high-impact availability and operational excellence initiatives spanning multiple teams. Balance competing priorities while maintaining focus on system reliability, operational health, and measurable availability outcomes.
- Transform operational incidents, availability gaps, and reliability concerns into prioritized technical specifications by partnering with engineering and operations teams to identify high-impact reliability requirements.
- Help teams distinguish between tactical incident fixes and strategic availability solutions, consolidating related operational needs into cohesive reliability improvement projects with clear business value and customer impact.
- Quantify operational and availability impact of proposed solutions by establishing and tracking key reliability metrics including uptime SLAs, MTTR, MTBF, error rates, and customer-impacting incidents that demonstrate business value and justify engineering resources.
- Develop comprehensive measurement frameworks that track improvements in system availability, operational efficiency, incident reduction, and service reliability. Establish baseline metrics and drive continuous improvement through data-driven decision making.
- Establish clear timelines with measurable milestones for availability improvements, operational runbook development, monitoring enhancements, and incident response optimization that align with business SLA commitments.
- Track cross-team commitments for operational readiness, document action items from incident reviews and COE investigations, and provide concise status updates to leadership that highlight availability trends, operational risks, mitigation strategies, and progress toward reliability goals.
- Drive operational excellence practices including blameless post-mortems, proactive monitoring and alarming strategies, capacity planning, and disaster recovery preparedness across engineering teams.
A great…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).