×
Register Here to Apply for Jobs or Post Jobs. X

Product Manager, AI Platform

Job in New York, New York County, New York, 10261, USA
Listing for: Fluidstack
Full Time position
Listed on 2026-03-06
Job specializations:
  • IT/Tech
    AI Engineer, Systems Engineer
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below
Location: New York

About Fluidstack

At Fluidstack, we’re building the infrastructure for abundant intelligence. We partner with top AI labs, governments, and enterprises - including Mistral, Poolside, Black Forest Labs, Meta, and more - to unlock compute at the speed of light.

We’re working with urgency to make AGI a reality. As such, our team is highly motivated and committed to delivering world‑class infrastructure. We treat our customers’ outcomes as our own, taking pride in the systems we build and the trust we earn. If you’re motivated by purpose, obsessed with excellence, and ready to work very hard to accelerate the future of intelligence, join us in building what’s next.

About the role

We’re hiring a Product Manager to own our AI platform roadmap, including managed inference and agent platforms. You’ll define how Fluidstack enables customers to deploy, scale, and optimize LLM inference workloads—from model serving and routing to agent orchestration and compound AI systems. This role requires balancing customer needs for low latency and high throughput with the operational realities of GPU utilization, cost efficiency, and platform reliability.

You’ll work across engineering, ML research, and go‑to‑market teams to position Fluidstack against inference‑first competitors like Together AI, Fireworks, Baseten, Modal, and Replicate.

What you’ll do
  • Own the product strategy and roadmap for managed inference services, including model deployment, autoscaling, multi‑LoRA serving, and inference optimization.

  • Define requirements for agent platform capabilities: structured outputs, function calling, memory primitives, tool integration, and multi‑step reasoning workflows.

  • Drive decisions on which inference optimizations to prioritize: speculative decoding, continuous batching, KV cache management, quantization support, and custom kernel integration.

  • Partner with ML infrastructure engineers to design APIs, SDKs, and deployment workflows that support model fine‑tuning, version management, and A/B testing.

  • Work with datacenter teams to optimize GPU allocation strategies—balancing dedicated vs. serverless deployments, cold start latency, and cost‑per‑token economics.

  • Analyze competitive offerings from Together AI (inference optimization stack), Fireworks (custom inference engine), Baseten (training‑to‑inference integration), and Modal (serverless architecture).

  • Define pricing models that align with customer usage patterns (tokens, requests, GPU‑hours) while maintaining healthy unit economics.

  • Conduct customer research to understand inference workload requirements: latency SLAs, throughput targets, model size constraints, and integration needs.

  • Translate customer feedback into feature specifications—including support for new model architectures, framework integrations (vLLM, Tensor

    RT‑LLM, TGI), and observability tooling.

  • Build go‑to‑market materials: reference architectures, performance benchmarks, cost calculators, and migration guides for customers moving from self‑hosted or competing platforms.

About you
  • 5+ years product management experience with at least 3 years focused on AI/ML infrastructure, inference platforms, or developer tools.

  • Strong technical understanding of transformer architectures, inference optimization techniques, and production ML systems.

  • Experience building products for technical users deploying LLMs in production (ML engineers, research scientists, AI application developers).

  • Track record of shipping features that improved inference latency, throughput, or cost efficiency—backed by quantitative metrics.

  • Deep familiarity with the inference ecosystem: serving frameworks (vLLM, Tensor

    RT‑LLM, TGI), model formats (GGUF, Safe Tensors), and API standards (OpenAI‑compatible endpoints).

  • Understanding of GPU memory constraints, batching strategies, and the tradeoffs between latency‑optimized vs. throughput‑optimized serving.

  • Ability to translate complex technical concepts (speculative decoding, Paged Attention, Multi‑LoRA) into clear customer value propositions.

  • Experience conducting competitive analysis in the inference market, including pricing elasticity, feature differentiation, and customer acquisition patterns.

  • Comfortable…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary