Manager, Cloud Infrastructure Demand & Capacity Planning
Listed on 2026-02-14
-
IT/Tech
Cloud Computing, IT Project Manager
Summary
Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other's ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better.
It's the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you'll do more than join something — you'll add something.
The Cloud Infrastructure Business Operations (CIBO) team is a centralized organization responsible for demand planning of infrastructure resources across Apple. We are seeking a Manager of ML Compute Capacity Planning to lead capacity planning efforts for Apple's ML Training and Gen AI platforms. These platforms provide services to all internal Apple developers, delivering efficient and scalable compute and processing for the machine learning lifecycle, from model experimentation to deployment, across the entire Apple consumer ecosystem.
We're looking for a strategic leader with deep expertise in capacity planning, demand forecasting, and infrastructure optimization for large-scale ML compute environments. In this role, you will build and lead a team of capacity planners responsible for ensuring Apple's ML and Gen AI infrastructure meets current and future demand. This includes developing long‑range capacity models, driving accelerator hardware strategy, managing supply/demand balance, and partnering with finance on investment planning.
You will serve as the central capacity planning voice for interactions with public cloud providers and internal Apple Cloud team’s. Ensuring Apple has the right compute resources, in the right place, at the right time, and at the right cost.
- Team Leadership & Development:
- Build, mentor, and lead a high‑performing team of capacity planners focused on ML and Gen AI compute infrastructure
- Establish team vision, goals, and operating rhythms that drive accountability and continuous improvement
- Foster a culture of analytical rigor, proactive planning, and cross‑functional collaboration
- Develop talent through coaching, career development, and succession planning
- Capacity Planning & Demand Forecasting:
- Own the end‑to‑end capacity planning function for Generalized ML Training and Gen AI compute accelerators.
- Develop and maintain sophisticated demand forecasting models that translate ML and Gen AI workload growth projections into infrastructure requirements
- Establish capacity planning processes, tools, and dashboards that provide visibility into current utilization, future demand, and capacity gaps
- Lead monthly, quarterly and annual capacity planning cycles, delivering actionable recommendations to senior leadership
- Supply/Demand Management:
- Drive supply/demand balance for ML and Gen AI compute Platforms, ensuring optimal resource allocation and utilization
- Establish and monitor key capacity metrics (utilization, efficiency, cost‑per‑workload) to inform strategic decisions
- Infrastructure Strategy & Technology Evaluation:
- Lead technical and economic evaluations of emerging accelerator technologies (GPUs, TPUs, custom silicon), balancing power, performance, cost, and compatibility
- Develop price/performance models that inform hardware selection and capacity investment decisions
- Define partnership strategy across the ML and Gen AI ecosystem including public cloud providers, internal Apple Cloud, silicon vendors, and infrastructure partners
- Identify and advocate for new technologies and approaches that improve capacity efficiency and cost‑effectiveness
- Cross‑Functional Partnership & Execution:
- Establish strategic partnerships with Line of Business (LOB’s) engineering teams and stakeholders to understand workload requirements, growth trajectories, and service level expectations
- Align capacity plans with hardware procurement, data center expansion, networking, and software readiness teams
- Translate Generalized ML Training and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).