Principal Deep Learning Communication Architect Job Austin area,Texas USA,Software Development

What You'll Be Doing

Architecture Leadership:
Define the long‑term technical roadmap for communication libraries across NVIDIA’s next‑generation platforms. Ensure seamless scaling of models to clusters comprising hundreds of thousands of nodes.
AI Communication Library Design:
Lead development of next‑generation communication primitives and collective algorithms. Optimize for heterogeneous interconnects such as NVLink, Spectrum‑X (Ethernet), and Quantum‑X (Infini Band).
Application‑Communication Library Co‑Design:
Partner with application developers to architect and implement specialized communication primitives. Ensure that AI and HPC libraries—including NCCL, NIXL, NVSHMEM, UCC, and UCX—evolve to meet the requirements of trillion‑parameter and Agentic AI.
Hardware/Software Co‑Design:
Collaborate with silicon architects and software engineers to influence hardware specifications for next‑generation networking, ensuring they meet the evolving demands of trillion‑parameter LLMs and Agentic AI.
Quantitative Modeling:
Develop high‑fidelity analytical models and simulators to predict system behavior under emerging workloads.

What We Need To See

Ph.D. or M.S. in Computer Science, Electrical Engineering, or related field (or equivalent experience), with 12+ years of industry experience in high‑performance computing (HPC) or distributed deep learning.
Parallelism Expertise:
Deep understanding of 3D parallelism (Data, Tensor, Pipeline) and advanced strategies including Context Parallelism, Expert Parallelism, and Zero Redundancy Optimizer (ZeRO) variants.
Technical Proficiency:
Deep technical proficiency with NCCL, UCX, UCC, NVSHMEM, or MPI. Experience with RDMA, RoCE, and low‑level Infini Band verbs is required.
Inference & Serving:
Advanced knowledge of high‑throughput inference engines and schedulers, specifically TensorRT-LLM, vLLM, SGLang, and NVIDIA Dynamo.
GPU Architecture:
Expert knowledge of the NVIDIA GPU memory hierarchy (HBM3e/HBM4, L2 cache) and CUDA programming models.

Ways To Stand Out

Framework Development:
Hands‑on experience developing within Megatron‑Core, Deep Speed, or JAX/XLA, with an understanding of how these frameworks interact with low‑level communication runtimes.
Significant upstream contributions to major open‑source projects (e.g., PyTorch Distributed, KServe, or Ray).
Proven track record of deploying and optimizing models on NVIDIA platforms or similar rack‑scale systems.
Strong portfolio of patents or papers in top‑tier systems/architecture venues (e.g., ISCA, ASPLOS, NeurIPS, SC).

Base salary range: 272,000 USD – 431,250 USD. You will also be eligible for equity and benefits.

Applications will be accepted until April 18, 2026.

NVIDIA is committed to fostering a diverse work environment and is an equal‑opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

#J-18808-Ljbffr