×
Register Here to Apply for Jobs or Post Jobs. X

Product Manager, Managed Services

Job in New York, New York County, New York, 10261, USA
Listing for: Fluidstack
Full Time position
Listed on 2026-03-06
Job specializations:
  • IT/Tech
    Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Location: New York

About Fluidstack

At Fluidstack, we’re building the infrastructure for abundant intelligence. We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta, and more - to unlock compute at the speed of light.

We’re working with urgency to make AGI a reality. As such, our team is highly motivated and committed to delivering world‑class infrastructure. We treat our customers’ outcomes as our own, taking pride in the systems we build and the trust we earn. If you’re motivated by purpose, obsessed with excellence, and ready to work very hard to accelerate the future of intelligence, join us in building what's next.

About

the Role

We're hiring a Product Manager to own our managed services portfolio, including SLURM and Kubernetes control planes. You'll define the product vision and roadmap for how enterprises deploy, manage, and scale workloads on Fluidstack's infrastructure—from initial cluster provisioning through lifecycle management, observability, and optimization. This role sits at the intersection of infrastructure, developer experience, and operational excellence, working closely with engineering, datacenter operations, and customer-facing teams to build control plane capabilities that scale to 100k+ GPU mega clusters.

What

you'll do
  • Own the product roadmap for managed SLURM and Kubernetes offerings, including control plane architecture, autoscaling, multi‑tenancy, and cluster lifecycle management.

  • Define requirements for control plane performance, reliability, and availability—including API rate limits, etcd scaling, provisioning tiers, and failure recovery mechanisms.

  • Work with engineering to design automated provisioning workflows, health monitoring systems, and node lifecycle controllers that minimize cluster downtime and maximize GPU utilization.

  • Partner with datacenter and networking teams to ensure control plane infrastructure scales seamlessly across geographic regions and supports hybrid deployment models.

  • Drive decisions on when to build vs. integrate with ecosystem tools (Rancher, Open Shift, Slurm accounting, workload orchestrators) based on customer requirements and competitive positioning.

  • Define metrics and SLAs for control plane uptime, API performance, scheduler throughput, and pod/job launch latency.

  • Conduct customer discovery to understand pain points around cluster management, job queueing, resource allocation, and multi‑cluster orchestration.

  • Create product documentation, deployment guides, and reference architectures for enterprise customers running large‑scale AI training and inference workloads.

  • Analyze competitive offerings from AWS EKS, Google GKE, Digital Ocean DOKS, and specialized HPC providers to inform feature prioritization and pricing strategy.

About you
  • 5+ years product management experience with at least 3 years focused on infrastructure, platform, or cloud services.

  • Deep technical understanding of Kubernetes control plane architecture (kube‑apiserver, etcd, scheduler, controller‑manager) and SLURM job scheduling.

  • Experience building or managing infrastructure products that serve technical users (platform engineers, ML engineers, researchers).

  • Track record of shipping features that improved cluster reliability, reduced time‑to‑deployment, or increased resource efficiency at scale.

  • Strong grasp of distributed systems concepts: consensus protocols, failure modes, back pressure handling, and operational complexity tradeoffs.

  • Familiarity with GPU workload patterns (multi‑node training, inference serving, batch processing) and how control plane design affects performance.

  • Ability to synthesize customer feedback, operational data, and competitive intelligence into clear product requirements and technical specifications.

  • Experience working with engineering teams to debug production incidents, analyze root causes, and translate findings into product improvements.

  • Comfortable navigating ambiguity and making pragmatic tradeoffs between feature completeness, time‑to‑market, and technical debt.

  • Bonus:
    Experience with HPC schedulers (LSF, PBS, Grid Engine), cloud‑native storage (Ceph, Lustre), or datacenter automation.

Compensation

To provide…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary