×
Register Here to Apply for Jobs or Post Jobs. X

AI and Systems Software Intern, At Scale AI - Fall

Job in Santa Clara, Santa Clara County, California, 95053, USA
Listing for: NVIDIA Gruppe
Apprenticeship/Internship position
Listed on 2026-06-02
Job specializations:
  • Software Development
    Software Engineer, DevOps
Salary/Wage Range or Industry Benchmark: 20 - 71 USD Hourly USD 20.00 71.00 HOUR
Job Description & How to Apply Below
Position: AI and Systems Software Intern, At Scale AI - Fall 2026

What you’ll be doing:

  • Investigate and triage failures within large-scale compute clusters, performing deep‑dive analysis to distinguish between software glitches, configuration errors, and hardware faults.
  • Analyze logs and telemetry to correlate specific job failures to system‑level issues and diagnostic test failures, helping to reduce noise and identify root causes.
  • Assist with the tracking, calculation, and reporting on key reliability metrics, specifically Mean Time Between Failures (MTBF) and Mean Time Between Interruptions (MTBI), to drive infrastructure improvements.
  • Assist in analyzing large‑scale workload issues, searching for application and infrastructure improvement opportunities to ensure jobs run as fast and reliably as possible.
  • Work closely with a mentor to learn about hardware validation suite architecture, document debugging methodologies, and help the team make intelligent, data‑backed engineering decisions.
What we need to see from you:
  • Pursuing a BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or a related field.
  • Proficiency in Python and Bash/Shell scripting for automation and tool development.
  • Proven debugging skills with an ability to isolate issues in complex, distributed systems.
  • Exposure to high‑performance computing (HPC) environments, cluster managers (e.g., Slurm, Kubernetes), or large‑scale distributed systems.
Ways to stand out:
  • Familiarity with server architecture (PCIe, NVLink, CPU/GPU interactions) and hardware diagnostics.
  • Experience with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).
  • Familiarity with system profiling and debugging tools (e.g., strace, gdb, perf).
  • Experience running and analyzing standard industry benchmarks on Linux systems.
  • Desire to learn and be part of a committed and hardworking team with excellent collaboration and communication skills.
  • Ability to multitask effectively in a dynamic, high‑performance environment.

Internship hourly rates are a standard pay based on the position, your location, year in school, degree, and experience. The hourly rate for our interns is 20 USD - 71 USD.

You will also be eligible for Intern benefits.

Applications for this job will be accepted at least until May 31, 2026.

NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary