Senior Manager, GPU Cloud Infrastructure - GeForce
Listed on 2026-06-17
-
IT/Tech
SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, IT Infrastructure, Network Engineer
GeForce NOW is the global leader in cloud gaming, dedicated to making high‑end play accessible on any device, from smartphones to VR headsets. We leverage NVIDIA’s premier data centers to stream over 2,000 games at up to 5K resolution and 120 FPS, ensuring a local‑feel experience with ultra‑low latency. Joining the GFN team means playing a vital role in advancing interactive entertainment at scale.
We are looking for a Senior Manager to lead the design, scaling, and operations of high‑performance networking for GPU‑based cloud infrastructure. This role is critical to enabling cloud gaming workloads, AI/ML training, and inference platforms by delivering ultra‑low‑latency, high‑throughput, and highly reliable interconnects across data centers and cloud environments.
What you will be doing:Build and mentor a specialized team of network architects focused on high‑performance GPU infrastructure.
Oversee the design of intra‑cluster and inter‑cluster connectivity, utilizing RoCE, Ethernet‑based AI fabrics, and high‑bandwidth data center interconnects.
Drive technical tuning to reduce latency, jitter, and increase throughput while implementing congestion control and packet‑loss mitigation strategies.
Define the roadmap for networking strategies that support gaming, AI/ML training, and real‑time inference at scale.
Engage with ISPs to optimize low‑latency edge networks and ensure a seamless connection from our data centers to end clients.
Implement Infrastructure as Code (IaC) and observability frameworks to automate provisioning, scaling, and real‑time cluster health monitoring.
Work directly with AI platform teams, hardware vendors, and SRE groups to influence technology direction and vendor selection.
Establish protocols for fault tolerance and lead incident response and root cause analysis for complex network issues.
12+ overall years of proven experience in networking, cloud infrastructure, or distributed systems with 5+ years of experience directly managing technical teams.
Mastery of data center networking, including Clos/spine‑leaf architectures and high‑performance fabrics like RDMA, RoCE, or Infini Band.
Hands‑on experience with BGP, EVPN/VXLAN, and kernel‑level development for routing and switching.
Skilled in using Ansible or Terraform for infrastructure automation, paired with monitoring tools like Prometheus and Grafana.
Practical experience designing for large‑scale configurations using SR‑IOV, Xen virtualization, or Open Virtual Switch.
Bachelor’s or Master’s degree in Computer Science or a related engineering field (or equivalent experience).
Ability to ensure all infrastructure meets rigorous internal policies and regulatory standards like GDPR.
Proven success managing networking for large‑scale GPU clusters or hyperscale cloud environments.
Familiarity with optical networking and high‑speed interconnects reaching 400G or 800G.
Experience in debugging and improving code for Mellanox/Cumulus Linux or managing Palo Alto and Netscaler appliances.
A strong grasp of streaming telemetry and operational signals (SNMP, Syslog) to proactively resolve complex architectural bottlenecks.
Relevant top‑tier certifications, such as CCIE or specialized cloud networking designations.
Salary range: $256,000 – $414,000 USD. You will also be eligible for equity and benefits.
Applications will be accepted at least until April 11, 2026.
This posting is for an existing vacancy.
NVIDIA uses AI tools in its recruiting processes.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).