More jobs:
Sr. Technical Program Manager , Prime Video Sports Linear and News
Job in
Seattle, King County, Washington, 98111, USA
Listed on 2026-02-12
Listing for:
Amazon
Full Time
position Listed on 2026-02-12
Job specializations:
-
IT/Tech
Systems Engineer, SRE/Site Reliability
Job Description & How to Apply Below
DESCRIPTION:
Love sports? Want to change the world? Prime Video is re-inventing the live sports experience by merging high-quality live streaming, immersive interactive features, and exclusive access to some of the world's most loved sports properties - including the NFL, NBA, English Premier League and UEFA Champions League soccer, Rolland-Garros (French Open), the US Open and more. Every day we face the challenges of a fast-paced market and expanding technology set.
And we're building at a scale only Amazon is capable of. The Prime Video Sports team is a single-threaded-owner, uniquely positioned to execute our vision across the entire Prime Video stack to deliver the most compelling new experiences to our customers. If this sounds like the type of experience you'd like to help build - apply here and let's chat!
Who we are:
We are deeply connected with the idea that live sports events are fundamentally different: there is only one shot at delivering the moment until it is forever lost. Our mission is to make sure that our customers never miss a single moment of the sports they love. To do this we need to deliver best-in-class live sports infrastructure, testing, monitoring, and operations to optimize the delivery of live sports experiences team engineers for speed, quality, availability, and scale.
As Prime Video has grown, we've increased the depth and breadth of video products we support, with a corresponding increase in the associated size and complexity of our engineering activity and the load on our services. Our goal is to adapt our architecture to maintain and improve our velocity of delivery for customers in the face of these changes, and for our services to be on and error-free 24/7, especially for high-profile, exclusive content.
With AI as a transformative force, we're at an inflection point that enables us to unlock new opportunities in predictive analytics, automated defect analysis and mitigation, and automated operations that were previously beyond our reach.
What we're looking for:
We need a highly talented Senior Technical Program Manager to build resilient, highly available, and operationally excellent systems that power our vision. You'll anticipate availability risks, manage critical incidents and escalations, and balance business SLA commitments against technical constraints and operational trade-offs.
You must break down complex reliability initiatives into manageable operational improvements, develop comprehensive availability specifications, and deliver measurable enhancements to system uptime and operational health. We seek someone with a strong curiosity to build and nurture an operationally mature, reliability-focused engineering culture-fostering proactive monitoring practices, blameless post-mortem disciplines, and integrating emerging observability and chaos engineering technologies into our operational excellence framework.
What you will do here:
Manage complex, high-impact availability and operational excellence initiatives spanning multiple teams. Balance competing priorities while maintaining focus on system reliability, operational health, and measurable availability outcomes.
Transform operational incidents, availability gaps, and reliability concerns into prioritized technical specifications by partnering with engineering and operations teams to identify high-impact reliability requirements.
Help teams distinguish between tactical incident fixes and strategic availability solutions, consolidating related operational needs into cohesive reliability improvement projects with clear business value and customer impact.
Quantify operational and availability impact of proposed solutions by establishing and tracking key reliability metrics including uptime SLAs, MTTR (Mean Time To Recovery), MTBF (Mean Time Between Failures), error rates, and customer-impacting incidents that demonstrate business value and justify engineering resources.
Develop comprehensive measurement frameworks that track improvements in system availability, operational efficiency, incident reduction, and service reliability. Establish baseline metrics and drive continuous improvement through data-driven decision making.
Establish clear timelines with measurable milestones for availability improvements, operational runbook development, monitoring enhancements, and incident response optimization that align with business SLA commitments.
Track cross-team commitments for operational readiness, document action items from incident reviews and COE (Correction of Error) investigations, and provide concise status updates to leadership that highlight availability trends, operational risks, mitigation strategies, and progress toward reliability goals.
Drive operational excellence practices including blameless post-mortems, proactive monitoring and…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×