×
Register Here to Apply for Jobs or Post Jobs. X

AI Benchmarking and Telemetry Engineer - NVIS

Job in Virginia, St. Louis County, Minnesota, 55792, USA
Listing for: NVIDIA
Full Time position
Listed on 2026-04-22
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Systems Engineer, Data Engineer
Salary/Wage Range or Industry Benchmark: 224000 - 356500 USD Yearly USD 224000.00 356500.00 YEAR
Job Description & How to Apply Below

NVIDIA has been transforming computer graphics, PC gaming, and accelerated computing for more than 25 years. It’s a unique legacy of innovation that’s fueled by great technology—and amazing people. Today, we’re tapping into the unlimited potential of AI to define the next era of computing. An era in which our GPU acts as the brains of computers, robots, and self‑driving cars that can understand the world.

Doing what’s never been done before takes vision, innovation, and the world’s best talent. As an NVIDIAN, you’ll be immersed in a diverse, supportive environment where everyone is inspired to do their best work. Come join the team and see how you can make a lasting impact on the world.

What You'll Be Doing
  • Formulate benchmarking methods for high‑performance computing and AI tasks, and perform and bring these methods to completion on large‑scale GPU clusters.
  • Develop and maintain telemetry infrastructure to capture performance data at host‑level GPU/CPU, network fabric, and power/thermal characteristics within the facility.
  • Collaborate closely with hardware engineering, software development, and customer‑facing teams to define performance requirements, fix bottlenecks, and validate configurations against real‑world workloads.
  • Deploy and manage observability stacks such as Prometheus, Grafana, NVIDIA’s DCGM, and custom telemetry solutions to provide actionable insights into cluster health, utilization, and performance trends.
  • Work directly with engineering partners to understand performance requirements, conduct on‑site benchmarking engagements, and deliver detailed analysis and recommendations for workload optimization.
  • Maintain extensive knowledge of industry‑standard benchmarks in advanced computing and machine learning (e.g., HPL, HPCG, MLPerf, NCCL) and contribute to developing new benchmarking methodologies for emerging workloads.
What We Need To See
  • Bachelor’s degree in Computer Science, Electrical Engineering, Computer Engineering, or a related field (or equivalent experience).
  • 8+ years of direct experience working with HPC and/or AI infrastructure, including cluster deployment, performance analysis, and benchmarking.
  • Deep expertise in Linux system administration, including kernel tuning, process scheduling, storage I/O optimization, and solving performance issues at scale.
  • Proven experience crafting and implementing telemetry and monitoring solutions for large‑scale distributed systems (Prometheus, Grafana, DCGM, collectd, or similar).
  • Solid grasp of GPU architectures, CUDA programming principles, and GPU performance traits in high‑performance computing and artificial intelligence workloads.
  • Familiarity with job schedulers (Slurm, PBS, LSF) and container orchestration platforms (Kubernetes, Docker) in HPC/AI environments.
  • Proficiency in Python, Bash, and other scripting languages for automation, data analysis, and workflow orchestration.
  • Excellent analytical and problem‑solving skills with the ability to interpret complex performance data and communicate findings to both technical and non‑technical audiences.
Ways To Stand Out From The Crowd
  • Experience with high‑performance networking technologies such as Infini Band, RoCE, and Ethernet fabric tuning and performance analysis.
  • Knowledge of parallel file systems (Lustre, GPFS, BeeGFS, Weka, VAST) with performance tuning and benchmarking.
  • Background in power and thermal management for high‑density compute environments (PUE optimization, liquid cooling).
  • Contributions to open‑source benchmarking tools or performance analysis frameworks.
  • Industry certifications such as RHCE, CKA, or vendor‑specific HPC/data‑center credentials.

Competitive salaries and a comprehensive benefits package. Base salary range is $184,000 – $287,500 for Level 4 and $224,000 – $356,500 for Level 5. Eligible for equity and benefits.

NVIDIA uses AI tools in its recruiting processes.

NVIDIA is committed to fostering a diverse work environment and is proud to be an equal opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary