Capacity and Infrastructure Operations Manager
Listed on 2026-02-17
-
IT/Tech
Business Systems/ Tech Analyst, Data Science Manager
Capacity and Infrastructure Operations Manager
SF Bay Area (Hybrid)
Parasail is redefining AI infrastructure by enabling seamless deployment across a distributed network of GPUs, optimizing for cost, performance, and flexibility. Our mission is to empower AI developers with a fast, cost-efficient, and scalable cloud experience—free from vendor lock-in and designed for the next generation of AI workloads.
Role OverviewWe’re hiring the first Capacity & Infrastructure Operations Manager to own the operational and analytical “supply-side” of our GPU fleet. You will partner closely with Engineering, Finance, Product, and GTM to maximize utilization, manage vendor performance and risk, improve unit economics, and build the operating cadence and dashboards that keep supply healthy and cost-effective.
This is an operator
role (not people management). You’ll drive execution through clear processes, metrics, reporting, vendor coordination, and cross-functional alignment.
We source capacity from neocloud and GPU infrastructure providers
, including (examples):
Hydra Host, Shadeform, Volt Park
, and others.
- Own real-time fleet utilization: identify and resolve idle capacity, inefficiencies, and demand/supply mismatches.
- Define utilization targets and operating policies that balance performance, reliability, and cost.
- Develop policies and processes for lifecycle management of vendor-sourced instances (bring-up, steady state, rebalancing, decommissioning).
- Partner with Engineering to define requirements and prioritize automations for capacity acquisition, scaling, rebalancing, failovers, and cost controls.
- Model and monitor GPU unit economics: cost per GPU-hr, marginal cost, blended vendor rates, and cost leakage.
- Partner with Finance & Product to align customer pricing with underlying vendor economics.
- Deliver monthly/quarterly reporting on supply-side cost trends and margin performance, including key drivers and recommended actions.
- Recommend improvements to pricing, contract mix, vendor allocation, and operational policies to expand gross margin.
- Build and maintain forecasting models to predict demand, burst behavior, seasonality, and reserve requirements.
- Determine the optimal mix of contract types (on-demand, committed use, short-term) to maximize flexibility and margin.
- Maintain capacity buffers and contingency plans to protect against vendor outages, degraded performance, or sudden demand spikes.
- Source, evaluate, and manage relationships with neocloud and GPU infrastructure providers.
- Negotiate pricing, SLAs, commitments, contractual flexibility, and scaling terms.
- Create and maintain vendor scorecards (pricing, reliability, latency, responsiveness, and fit).
- Identify emerging vendors, negotiate trial capacity, and assess cost–performance tradeoffs.
- Develop a multi-vendor redundancy strategy to minimize single-provider risk.
- Stand up the core dashboards and operating cadence to monitor and manage (examples):
- per GPU family utilization
- utilization by contract type
- idle capacity
- blended cost per GPU hour
- vendor latency / performance
- error/outage risk indicators
- Help define a practical tool stack for capacity planning and financial analysis. Examples may include:
- Spreadsheets:
Excel / Google Sheets - Data & querying: SQL; data warehouses (e.g., Databricks/Big Query)
- BI / dashboards:
Looker, Tableau, Metabase, Grafana (or equivalents) - Planning / FP&A platforms
- Work management & documentation:
Click Up/Linear, Notion, Google Docs (or equivalents)
- Spreadsheets:
- Serve as the primary operational point of contact across vendors and internal teams for supply-side performance, risk, and escalations.
- Support Sales/GTM with capacity availability and supply risk inputs for large customer deals.
- Advise leadership on supply-side risks, mitigations, and operational opportunities.
- 5+ years in capacity operations, infrastructure operations, technical operations, cloud supply/vendor ops, or a closely related role.
- Demonstrated experience managing…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).