DGX Cloud Performance Engineer - New Grad
Listed on 2026-01-01
-
Software Development
AI Engineer, Software Engineer
Location: California
NVIDIA DGX™ Cloud is an end‑to‑end, scalable AI platform for developers, built on the latest NVIDIA architecture and co‑engineered with the world’s leading cloud service providers (CSPs). We’re looking for highly skilled Parallel and Distributed Systems engineers to lead performance analysis, optimization and modeling that defines the architecture and design of Nvidia’s DGX Cloud clusters.
What You Will Be Doing- Develop benchmarks, end‑to‑end customer applications running at scale, instrumented for performance measurements, tracking and sampling to measure and optimize the performance of critical applications and services;
- Design carefully controlled experiments to analyze, study and develop critical insights into performance bottlenecks and dependencies from an end‑to‑end perspective;
- Propose improvements to end‑to‑end system performance and usability by driving changes in hardware or software (or both);
- Collaborate with AI researchers, developers and service providers to understand developer and customer pain points, requirements and future needs and share best practice;
- Develop modeling frameworks and total cost of ownership (TCO) analyses to enable efficient exploration and sweep of the architecture and design space;
- Define the methodology needed to drive engineering analysis to inform the architecture, design and roadmap of DGX Cloud.
- Currently pursuing a Master’s or Ph.D. in Engineering or equivalent experience (preferably in Electrical Engineering, Computer Engineering or Computer Science);
- Experience working with large‑scale parallel and distributed accelerator‑based systems;
- Experience optimizing performance and AI workloads on large‑scale systems;
- Background in performance modeling and benchmarking at scale;
- Background in Computer Architecture, Networking, Storage systems or Accelerators;
- Familiarity with popular AI frameworks (PyTorch, Tensor Flow, JAX, Megatron‑LM, Tensort‑LLM, VLLM) and other tools;
- Experience with AI/ML models and workloads, especially large‑language models (LLMs) and an understanding of DNNs and their use in emerging AI/ML applications;
- Proficiency in Python and C/C++;
- Expertise with at least one public CSP infrastructure (GCP, AWS, Azure, OCI, …).
- Ph.D. in the relevant areas;
- Very high intellectual curiosity, confidence to dig in, not afraid of confronting complexity, ability to pick up new areas quickly;
- Proficiency in CUDA, XLA;
- Excellent interpersonal skills.
Base salary: $148,000 – $235,750 USD (Level
3) and $184,000 – $287,500 USD (Level
4). Equity and benefits are also included.
Applications will be accepted at least until December 19, 2025.
NVIDIA is committed to fostering a diverse work environment and is proud to be an equal‑opportunity employer. NVIDIA does not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).