Senior Deep Learning Software Engineer,TensorRT Job Santa Clara area,California USA,Software Development

Position: Senior Deep Learning Software Engineer, TensorRT Performance

We are now looking for a Senior Deep Learning Software Engineer, TensorRT Performance! NVIDIA is seeking an experienced Deep Learning Engineer passionate about analyzing and improving the performance of NVIDIA’s inference ecosystem. NVIDIA is rapidly growing its research and development for Deep Learning Inference and is seeking excellent Software Engineers at all levels of expertise to join our team.

What you’ll be doing

Establish groundbreaking performance benchmarking methodologies and analysis workflows and identify performance issues and opportunities for NVIDIA’s inference ecosystem (e.g., TensorRT, TensorRT-EdgeLLM, Torch-TensorRT).
Contribute features and code to NVIDIA/OSS inference frameworks including but not limited to TensorRT, TensorRT-EdgeLLM, and Torch-TensorRT.
Develop new model pipelines for NVIDIA’s inference ecosystem with optimized performance including but not limited to areas such as quantization, scheduling, memory management, and distributed inference to set the gold standard for Gen AI performance.
Work with cross‑collaborative teams inside and outside of NVIDIA across generative AI, automotive, robotics, image understanding, and speech understanding to set directions and develop innovative inference solutions.
Scale performance of deep learning models across different architectures and types of NVIDIA accelerators.

What we need to see

Bachelor’s, Master’s, Ph.D., or equivalent experience in relevant fields (Computer Science, Computer Engineering, EECS, AI).
At least 3 years of relevant software development experience.
Strong C++, Python programming and software engineering skills.
Experience with deep learning frameworks (e.g., PyTorch, JAX, Tensor Flow, ONNX) and inference libraries (e.g., TensorRT, TensorRT‑LLM, vLLM, SGLang, Flash Infer).
Experience with performance analysis and performance optimization.

Ways to stand out from the crowd

Strong foundation and architectural knowledge of GPUs.
Deep understanding of modern deep learning models and workloads (e.g., Transformers, Recommenders, ASR, TTS, Visual Understanding).
Proficiency in one of the deep learning programming domain specific languages (e.g., CUDA, TileIR, CuTeDSL, cutlass, Triton).
Prior contributions to major LLM inference frameworks (e.g., vLLM) or prior experience with graph compilers in deep learning inference (e.g., Torch Dynamo, Torch Inductor).
Prior experience optimizing performance for low‑latency, resource‑constrained systems or embedded AI pipelines (e.g., Jetson systems or other edge AI accelerators).

Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 152,000

USD–241,500

USD for Level3 and 184,000

USD–287,500

USD for Level
4. You will also be eligible for equity and benefits.

Applications for this job will be accepted until March
26,2026.

NVIDIA is committed to fostering a diverse work environment and is a proud equal‑opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.

#J-18808-Ljbffr