×
Register Here to Apply for Jobs or Post Jobs. X

AI Inference Performance Engineer - College Grad

Job in Santa Clara, Santa Clara County, California, 95053, USA
Listing for: NVIDIA Gruppe
Full Time position
Listed on 2026-06-17
Job specializations:
  • Software Development
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: AI Inference Performance Engineer - New College Grad 2026

Overview

We optimize and benchmark GenAI inference on NVIDIA's latest accelerators, defining performance standards across language models, video generation, and speech workloads. We work within TensorRT-LLM, SGLang, and vLLM, building tools that evaluate serving performance s team sits at the intersection of GPU performance engineering and public accountability.

Responsibilities
  • Drive industry benchmark results: own end-to-end optimization pipeline, implement and integrate optimizations in quantization, scheduling, memory management, and distributed inference across TensorRT-LLM, SGLang, and vLLM.
  • Define and optimize cutting-edge workloads: identify and shape next-generation inference benchmarks, multi-turn coding, agentic workflows, and other emerging AI use cases. Collaborate with framework and kernel teams to push performance to its extreme on large-scale LLM-MoE models, vision-language models, video diffusion models, recommendation, and speech workloads.
  • Architect distributed inference: design and optimize execution from single-GPU to rack-scale clusters, managing performance across clusters of GPUs.
  • Establish performance methodology: apply roofline analysis and systematic profiling to decompose bottlenecks across CUDA kernels, frameworks, and serving layers.
  • Influence the ecosystem: contribute to TensorRT-LLM, vLLM, SGLang, and other open-source projects. Partner with architecture, kernel, and compiler teams to shape GPU roadmaps based on real workload data.
  • Technical leadership: raise the technical bar for the team, drive cross-functional execution on tight benchmark timelines, and lead a world-class team.
Qualifications
  • BS, MS, or PhD in Computer Science, Computer Engineering, Electrical Engineering, or equivalent experience.
  • 2+ years of relevant software development experience.
  • Strong Python or C++ programming, software design, and software engineering skills.
  • Expertise with a DL framework such as PyTorch or JAX.
  • Proven track record of delivering measurable performance improvements in deep learning inference or high-performance systems.
  • Deep understanding of LLM/VLM architectures and inference mechanics: attention, KV caching, batching strategies, decode-phase bottlenecks, speculative decoding, disaggregated serving, etc.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary