More jobs:
Principal Deep Learning Communication Architect
Job in
Austin, Travis County, Texas, 78716, USA
Listed on 2026-06-18
Listing for:
NVIDIA
Full Time
position Listed on 2026-06-18
Job specializations:
-
Software Development
AI Engineer (Applied/Software), Software Architect
Job Description & How to Apply Below
What You'll Be Doing
- Architecture Leadership:
Define the long‑term technical roadmap for communication libraries across NVIDIA’s next‑generation platforms. Ensure seamless scaling of models to clusters comprising hundreds of thousands of nodes. - AI Communication Library Design:
Lead development of next‑generation communication primitives and collective algorithms. Optimize for heterogeneous interconnects such as NVLink, Spectrum‑X (Ethernet), and Quantum‑X (Infini Band). - Application‑Communication Library Co‑Design:
Partner with application developers to architect and implement specialized communication primitives. Ensure that AI and HPC libraries—including NCCL, NIXL, NVSHMEM, UCC, and UCX—evolve to meet the requirements of trillion‑parameter and Agentic AI. - Hardware/Software Co‑Design:
Collaborate with silicon architects and software engineers to influence hardware specifications for next‑generation networking, ensuring they meet the evolving demands of trillion‑parameter LLMs and Agentic AI. - Quantitative Modeling:
Develop high‑fidelity analytical models and simulators to predict system behavior under emerging workloads.
- Ph.D. or M.S. in Computer Science, Electrical Engineering, or related field (or equivalent experience), with 12+ years of industry experience in high‑performance computing (HPC) or distributed deep learning.
- Parallelism Expertise:
Deep understanding of 3D parallelism (Data, Tensor, Pipeline) and advanced strategies including Context Parallelism, Expert Parallelism, and Zero Redundancy Optimizer (ZeRO) variants. - Technical Proficiency:
Deep technical proficiency with NCCL, UCX, UCC, NVSHMEM, or MPI. Experience with RDMA, RoCE, and low‑level Infini Band verbs is required. - Inference & Serving:
Advanced knowledge of high‑throughput inference engines and schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo. - GPU Architecture:
Expert knowledge of the NVIDIA GPU memory hierarchy (HBM3e/HBM4, L2 cache) and CUDA programming models.
- Framework Development:
Hands‑on experience developing within Megatron‑Core, Deep Speed, or JAX/XLA, with an understanding of how these frameworks interact with low‑level communication runtimes. - Significant upstream contributions to major open‑source projects (e.g., PyTorch Distributed, KServe, or Ray).
- Proven track record of deploying and optimizing models on NVIDIA platforms or similar rack‑scale systems.
- Strong portfolio of patents or papers in top‑tier systems/architecture venues (e.g., ISCA, ASPLOS, NeurIPS, SC).
Base salary range: 272,000 USD – 431,250 USD. You will also be eligible for equity and benefits.
Applications will be accepted until April 18, 2026.
NVIDIA is committed to fostering a diverse work environment and is an equal‑opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×