Senior Technical Program Manager, Cloud Infrastructure
Listed on 2026-06-20
-
Engineering
Systems Engineer -
IT/Tech
Systems Engineer
Overview
NVIDIA's deep learning platforms lead innovation across multiple industries. We are looking for a Technical Program Manager (TPM) to join our DGX Cloud team and help drive AI capacity and infrastructure worldwide.
What you'll be doingAs a DGX Cloud Technical Program Manager, you'll partner with Engineering, Infrastructure, and Software teams to manage critical programs that enable AI capacity and performance for our customers. Your work will help shape the foundational capabilities and processes for DGX Cloud, covering cluster capacity bring‑up for CPU, storage, networking, and compute requirements that support GPUs.
Responsibilities- Working in close coordination with storage engineering and network engineering teams to define and communicate requirements to CSP (Cloud Service Providers) and NCPs (NVIDIA Cloud Providers). Drive alignment and a POR for capacity blocks based on workload needs.
- Drive early engagement with CSP (Cloud Service Providers) and NCPs (NVIDIA Cloud Providers) to understand their managed storage, network solutions and influence alignment with NVIDIA Cloud roadmap.
- Gathering technical requirements, developing comprehensive roadmaps, establishing clear milestones, and ensuring adherence to our Product Lifecycle (PLC) process.
- Managing ongoing capacity operations and the engineering engagement with CSP (Cloud Service Providers) and NCPs (NVIDIA Cloud Providers) partners, collaborating closely with engineering leads. Focus on availability, maintenance and other critical performance indicators.
- Partner closely within NVIDIA to understand workload requirements and related hardware and infrastructure needs. This includes speeds and feeds to optimize infrastructure readiness with cloud vendors and NVIDIA Cloud Providers.
- Leveraging Jira and other program management platforms to instill rigor and structure in the management of engineering deliverables.
- Identifying and driving opportunities to onboard the adoption of third‑party and in‑house cloud infrastructure solutions for deployments, support, security, compliance and observability across DGX Cloud.
- Establishing key performance indicators (KPIs) and quantitatively demonstrating the value and impact delivered by your programs.
- Proactively identifying, resolving, and mitigating risks and issues that could affect scope, schedule, and quality across all program aspects.
- Encouraging a culture of continuous improvement, consistently seeing opportunities for process improvements within our cloud infrastructure operations.
- 8+ years of technical program management experience. You have driven the planning and execution of large‑scale cloud infrastructure programs with outside organizations. You focus strongly on software engineering projects within a matrixed organization.
- Extensive hands‑on experience in cloud infrastructure, preferably gained from working at a major Cloud Service Provider (CSP).
- Domain knowledge in the bring‑up and end‑to‑end operations of compute, storage and GPU (including common failure points at the HW and SW levels).
- Expert‑level proficiency with Jira, Smartsheet, or similar program management tools, with the ability to expertly guide engineering teams on their use of the tools.
- Outstanding strategic and tactical thinking abilities, coupled with a strong capacity to build consensus and drive program success.
- Comfort and efficiency in growing within ambiguous environments.
- Possess excellent communication and technical presentation skills, particularly for executive audiences.
- BS or MS in Electrical Engineering or Computer Science, or equivalent experience.
- In depth knowledge of NVIDIA GPU products, including deployment and bring‑up.
- Working knowledge of various cloud technologies (Kubernetes, API integration, Terraform, etc).
- A highly enthusiastic, energetic, responsive, and passionate individual with a keen eye for identifying process improvement opportunities.
- Significant experience with productivity tools and process automation is a major plus.
- Deep familiarity with cloud‑native product / services environments and familiarity with AI, ML infrastructure, and cloud/services.
Base Salary Range:
Level 4: 168,000 USD – 258,750 USD;
Level 5: 200,000 USD – 322,000 USD. Eligible for equity and benefits.
NVIDIA is committed to fostering a diverse work environment and we do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).