×
Register Here to Apply for Jobs or Post Jobs. X

Technical Program Manager, ML Developer and Infrastructure Reliability

Job in Kurtistown, Hawaii County, Hawaii, 96760, USA
Listing for: Waymo
Full Time position
Listed on 2026-02-07
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Job Description & How to Apply Below
Position: Technical Program Manager, ML Developer Experience and Infrastructure Reliability
Location: Kurtistown

Overview

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases.

The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states.

Waymo’s Technical Program Managers and Program Managers are accountable for Waymo’s roadmap execution by providing thoughtful cross-functional planning, clarity, and proactive risk management. In the face of complex technical and operational challenges with no established playbooks to follow, we act with thoughtful urgency, driving conversations, discussions, and outcomes. Our team partners closely with every function of Waymo to structure, own and drive work towards real-world deployments of the Waymo Driver across platforms and geographies.

In this hybrid role, you will report to a Technical Program Management Director.

Responsibilities
  • Drive the "Golden Path" for ML:
    Lead cross-functional execution to define and invest in a simplified "golden path" for ML development for Onboard and Waymo

    FM development, targeting the reduction of friction and low reliability in the "inner loop"
  • Manage Reliability Operations:
    Ensure smooth day-to-day operations of the reliability triage ecosystem, keeping queues healthy through interaction with rotation members and driving automation of queue management
  • Program Implementation for Infra Stability:
    Drive "contract-based reliability" programs across Onboard domains
  • Bridge ML and Infra:
    Facilitate communication and alignment between ML research, infrastructure foundations, and onboard teams to resolve blockers in core workflows like root-causing brittle pipelines
  • Strategic Roadmap Tracking:
    Contribute to strategic planning and track project progress, risks, and KPIs related to ML developer productivity and infrastructure reliability for leadership reporting
  • Resolve Systemic Blockers:
    Proactively identify and resolve roadblocks in the ML development cycle, such as data fragmentation and complex tooling that currently hinders developer velocity
Qualifications
  • Technical

    Education:

    A Bachelor's degree in Computer Science, Engineering, or a related technical field
  • TPM

    Experience:

    5+ years of experience as a Technical Program Manager in a software engineering or large-scale infrastructure environment
  • ML/Reliability Track Record:
    Proven track record of managing complex technical projects involving machine learning infrastructure, developer experience (DevX), or site reliability engineering (SRE)
  • Program Ownership:
    Experience owning and driving programs end-to-end, including managing timelines, risks, and dependencies across multiple senior stakeholders
  • Analytical Problem Solving:
    Strong analytical and technical judgment skills, with the ability to use data to diagnose and solve systemic engineering bottlenecks
  • Communication Mastery:
    Excellent communication and interpersonal skills, with a demonstrated ability to convey complex technical concepts to both researchers and infrastructure engineers
Preferred
  • Advanced ML Operations:
    Experience with ML observability, root-causing production pipelines, and automating large-scale offline inference or model training experiments
  • Large-Scale Data Management:
    Background in managing multi-petabyte scale datasets, data validation frameworks, or unified data management solutions
  • Reliability Frameworks:
    Familiarity with contract-based reliability models, SLO management for autonomous systems, or reliability triage ecosystems
  • Developer Platforms:
    Experience building or managing "golden path" developer platforms or developer tooling that simplifies complex, fragmented tech stacks
  • Advanced Degree:
    Master’s degree or PhD in a related technical field
  • Autonomous…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary