Senior Technical Product Manager Operations
Listed on 2026-06-06
-
IT/Tech
Systems Engineer, IT Support
Senior Technical Product Manager, Fleet Operations
London
About NscaleNscale is taking on the hyperscalers by building a vertically integrated GenAI cloud platform. We own the data centres, software, and applications that power today's AI stack using sustainable technology solutions. We thrive on a culture of relentless innovation, ownership, and accountability, where every team member takes pride in their work and drives it with excellence and urgency. As an Nscaler, you'll build trust through openness and transparency, where everyone is inspired to do their best work.
Collaboration is key, and we work together swiftly and respectfully, embracing adaptability and resilience in all we do.
Technical Product Managers at Nscale own the definition, delivery, and ongoing evolution of a slice of the Nscale platform. You partner closely with engineering, design, research, and go-to-market teams to translate customer problems and operational realities into shippable product outcomes. As a Senior Technical Product Manager for Fleet Operations, you own the product strategy for the day 0–2+ operational software that runs our global GPU fleet — the systems that bring capacity online, keep it healthy, and restore it fast when things go wrong.
You partner daily with Fleet Software engineering teams, SRE, and Support to turn operational pain into durable product: provisioning and bringup (day
0), testing and deployment (day
1), and the full lifecycle of monitoring, incident response, repair, RMA, firmware, and decommissioning (day 2+). You operate at team scope, owning a major product area and driving multi‑quarter initiatives that directly move fleet availability, utilisation, and time‑to‑recover.
- Own the strategy and roadmap for a significant Fleet Operations product area — e.g. provisioning and bring‑up, fleet health and telemetry, incident and repair workflows, firmware and lifecycle management, or capacity and inventory.
- Lead multi‑sprint, cross‑functional initiatives from problem framing through rollout across live GPU clusters, working hand‑in‑hand with Fleet Software, SRE, data centre operations, and Support.
- Turn operational ambiguity into product: shadow on‑call rotations, ride along with support and repair workflows, and translate recurring toil into tooling, automation, and platform capabilities.
- Define the metrics that matter for a GPU fleet — availability, utilisation, MTTR, time‑to‑bring‑up, hardware failure rates, support ticket deflection — and drive the roadmap against them.
- Partner with engineering on architecture and trade‑offs for systems that span bare metal, orchestration, observability, and control planes.
- Drive incident reviews and postmortems into product commitments; close the loop so the same class of issue doesn't recur.
- Mentor junior product managers and raise the quality bar for PRDs, reviews, and product decisions across the team.
- Represent Fleet Operations in planning, reviews, and leadership updates.
- 5–8 years of product management experience in software or technology, with a track record of owning significant product areas in infrastructure, platform, or operations‑facing products.
- Strong technical fluency in large‑scale systems: you can lead discussions with engineering on architecture, trade‑offs, and feasibility across provisioning, orchestration, observability, and control‑plane design.
- Experience building products for operators — SREs, NOC/support teams, data centre technicians, or similar — and a genuine appetite for understanding their workflows.
- Demonstrated ability to move from an ambiguous operational problem space to shipped product outcomes that measurably improve reliability, efficiency, or time‑to‑recover.
- Experience mentoring or informally leading peers. Excellent written and verbal communication; you can make complex product decisions legible to engineers, operators, and executives alike.
- Degree in computer science, engineering, or a related field, or prior experience as an engineer or SRE.
- Hands‑on background in cloud infrastructure, bare‑metal provisioning, fleet or hardware lifecycle management,…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: