×
Register Here to Apply for Jobs or Post Jobs. X

Senior AI Infrastructure Engineer - DGX Cloud

Job in California, Moniteau County, Missouri, 65018, USA
Listing for: NVIDIA
Full Time position
Listed on 2025-12-02
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Location: California

Overview

NVIDIA is looking for an outstanding, passionate, and talented Senior AI Infrastructure Engineer to join our DGX Cloud group. This engineering role will design, build and maintain large scale production systems with high efficiency and availability using software and systems engineering practices. The role requires knowledge across systems, networking, coding, databases, capacity management, continuous delivery and deployment, and open source cloud technologies like Kubernetes and Open Stack.

DGX Cloud SRE at NVIDIA ensures that internal and external facing GPU cloud services run with maximum reliability and uptime, while carefully planning changes, managing capacity and performance. NVIDIA values diversity, curiosity, problem solving, and openness, and fosters collaboration, risk-taking in a blame-free environment, self-direction to work on meaningful projects, and mentorship for learning and growth.

What You’ll Be Doing
  • Design, build, deploy, and run internal tooling for large-scale AI training and inference platforms built on top of cloud infrastructure
  • Conduct in-depth performance characterization and analysis on large multi-GPU and multi-node clusters
  • Engage in and improve the full lifecycle of services—from inception and design through deployment, operation and refinement
  • Support services before they go live through activities such as system design consulting, developing software tools, platforms and frameworks, capacity management and launch reviews
  • Maintain services once they are live by measuring and monitoring availability, latency and overall system health
  • Scale systems sustainably through automation and evolve systems to improve reliability and velocity
  • Practice sustainable incident response and blameless postmortems
  • Be part of an on-call rotation to support production systems
What We Need To See
  • BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent experience
  • 6+ years of experience
  • A track record showing initiative, collaboration, and ability to work with others on projects
  • Experience with infrastructure automation and distributed systems design for large-scale private or public cloud systems in production
  • Experience in one or more of Python, Go, C/C++, Java
  • In-depth knowledge of Linux, Networking, Storage, and Containers Technologies
  • Experience with Public Cloud and Infrastructure as Code (IaC) and Terraform
  • Distributed systems experience
Ways To Stand Out From The Crowd
  • Interest in crafting, analyzing and fixing large-scale distributed systems
  • Systematic problem-solving approach with strong communication, ownership, and drive
  • Ability to debug and optimize code and automate routine tasks; experience with large private and public cloud systems based on Kubernetes or Slurm

Salary and benefits information are provided to eligible applicants. Applications for this job will be accepted at least until September 24, 2025. NVIDIA is committed to fostering a diverse work environment and is an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

#J-18808-Ljbffr
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary