×
Register Here to Apply for Jobs or Post Jobs. X

Senior Technical Program Manager

Job in Seattle, King County, Washington, 98101, USA
Listing for: NVIDIA
Full Time position
Listed on 2026-07-03
Job specializations:
  • IT/Tech
    Data Engineering, IT Project Manager, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 168000 USD Yearly USD 168000.00 YEAR
Job Description & How to Apply Below

Senior Technical Program Manager

As a Senior Technical Program Manager with a passion for data-driven operations, you will lead the DGX Cloud Fleet Health reporting program — delivering real-time, actionable insights on the availability and reliability of our GPU fleet. A core focus of this role is advancing Mean-Time-Between-Interruption (MTBI): understanding the root causes of fleet interruptions, surfacing patterns in the data, and driving cross-functional programs to measurably extend fleet uptime.

You will partner closely with Capacity Operations, Infrastructure, SRE, and Engineering teams to translate complex fleet signals into decisions that directly improve customer experience. Join us in making a significant impact on the world's most powerful AI infrastructure.

What You'll Be Doing:
  • Define and own the metrics framework for measuring fleet health, reliability, and MTBI across a diverse and rapidly scaling GPU fleet.
  • Lead hands-on data investigations — querying telemetry, correlating failure signals, and building statistical models — to identify the root causes of interruptions and quantify their impact.
  • Own and drive execution of cross-functional MTBI improvement programs end-to-end — from translating analytical findings into a prioritized roadmap, to holding teams accountable to milestones and delivering measurable reliability gains.
  • Build and maintain dashboards, automated anomaly detection, and alerting frameworks that surface gaps in fleet health reporting in real time.
  • Anticipate and close reporting gaps with new cloud providers and hardware platforms by working closely with Infrastructure bring-up teams.
  • Communicate complex data findings and program status clearly to senior leadership, turning raw signals into crisp narratives and recommendations.
What We Need to See:
  • 8+ years of Technical Program Management experience, with at least 3 years in infrastructure, platform, or reliability-focused domains.
  • Strong hands-on data analytics skills — comfortable writing SQL, working with large telemetry datasets, and building dashboards (Grafana, Superset, Databricks, or equivalent).
  • Demonstrated ability to define and operationalize reliability metrics (MTBI, MTTR, availability SLAs) and drive engineering teams toward measurable improvements.
  • Proven ability to lead deep-dive investigations across ambiguous, multi-system problems and translate findings into long-term solutions.
  • Excellent executive communication skills — able to distill complex technical findings into clear, decision-ready narratives for senior leadership.
  • MS in EE, CS, or equivalent experience.
Ways to Stand Out From the Crowd:
  • Familiarity with NVIDIA GPU architectures and DGX/HGX infrastructure.
  • Experience with Databricks, Apache Spark, or other large-scale data processing platforms.
  • Hands-on experience with Grafana, Superset, or similar observability/BI tooling.
  • Background in cloud-native infrastructure, Kubernetes, or large-scale distributed systems.

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 168,000 USD - 258,750 USD for Level 4, and 200,000 USD - 322,000 USD for Level 5.

You will also be eligible for equity and benefits.

Applications for this job will be accepted at least until July 4, 2026.

NVIDIA is committed to fostering an inclusive work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary