Manager, Language Model Inference
Listed on 2026-01-07
-
Software Development
AI Engineer, Software Engineer, Machine Learning/ ML Engineer
At NVIDIA, we aren't just powering the AI revolution-we're accelerating it. The Tensor
RT inference platform is the backbone of modern AI, delivering the industry's fastest and most efficient deployment of cutting-edge deep learning models on every NVIDIA GPU. With demand for AI exploding, particularly in the realm of large language models (LLMs) and vision language models (VLMs, VLAs), we are significantly expanding our team. We're seeking a highly skilled and driven Engineering Manager to take the lead in developing the next generation of LLM/VLM/VLA inference software technologies that will define the future of AI.
This is a high-impact, hands-on leadership role at the intersection of deep technical expertise and world-class management. You won't just manage; you'll architect and guide a brilliant team of engineers who are building the core LLM inference runtime. Your work will be highly collaborative, interfacing directly with NVIDIA Researchers, GPU Architects, and other teams across the company to ensure we ship production-grade, lightning-fast software that sets the global standard for AI performance.
What You'll Be Doing:
- Lead and grow a team responsible for specialized kernel development, runtime optimizations, and frameworks for LLM inference.
- Drive the design, development, and delivery of production inference software, targeting NVIDIA's next-generation enterprise and edge hardware platforms.
- Integrating cutting-edge technologies developed at NVIDIA and offering an intuitive developer experience for LLM deployment.
- Lead software development execution, with responsibility for project planning, milestone delivery, and cross-functional coordination.
- MS, PhD, or equivalent experience in Computer Science, Computer Engineering, AI, or a related technical field.
- 7+ overall years of overall software engineering experience, including 3+ years of technical leadership experience.
- Proven ability to lead and scale high-performing engineering teams, especially across distributed and cross-functional groups.
- Strong background in C++ or Python, with expertise in software design and delivering production-quality software libraries.
- Demonstrated expertise in large language models (LLM) and/or vision language models (VLM).
- Deep understanding of GPU architecture, CUDA programming, and system-level performance tuning.
- Background in LLM inference or working with frameworks such as Tensor
RT-LLM, vLLM, or SGLang. - Passion for building scalable, user-friendly APIs and enabling developers in the AI ecosystem.
- Have a proven track record of growing and managing a team that encourages idea sharing, empowers team members, and provides opportunities for professional growth.
#LI-Hybrid
Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 184,000 USD - 287,500 USD for Level 2, and 224,000 USD - 356,500 USD for Level 3.
You will also be eligible for equity and benefits .
Applications for this job will be accepted at least until November 4, 2025.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.#J-18808-Ljbffr
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).