Deep Learning Solutions Architect – Inference Optimization
Listed on 2026-01-01
-
IT/Tech
AI Engineer
Deep Learning Solutions Architect – Inference Optimization
NVIDIA’s Worldwide Field Operations (WWFO) team is seeking a Solution Architect with a deep understanding of neural network inference. The role involves guiding customers on advanced inference techniques such as speculative decoding, request scheduler optimizations, or FP4 quantization, and leveraging tools like TRT LLM, vLLM, SGLang. The ideal candidate will have strong systems knowledge to help customers fully use new NVL
72 systems and optimize inference pipelines for hybrid or diffusion models.
- Work directly with key customers to understand their technology and provide the best AI solutions.
- Perform in‑depth analysis and optimization to ensure the best performance on GPU architecture systems, especially Grace/ARM based systems, including large‑scale inference pipeline optimization.
- Partner with Engineering, Product and Sales teams to develop and plan optimal solutions for customers. Enable development and growth of product features through customer feedback and proof‑of‑concept evaluations.
- Excellent verbal, written communication, and technical presentation skills in English.
- MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, or other Engineering fields.
- 5+ years of work or research experience with Python/C++/other software development.
- Work experience and knowledge of modern NLP, including transformer, state space, diffusion or MOE model architectures. Expertise in training or optimization/compression/operation of DNNs is preferred.
- Understanding of key libraries used for NLP/LLM training such as Megatron‑LM, NeMo, Deep Speed, and deployment libraries such as Tensor
RT‑LLM, vLLM, Triton Inference Server. - Enthusiastic about collaborating with various teams and departments—such as Engineering, Product, Sales, and Marketing—and thrives in dynamic environments.
- Self‑starter with a growth mindset, passion for continuous learning and sharing findings across the team.
- Demonstrated experience in running and debugging large‑scale distributed deep learning training or inference processes.
- Experience working with larger transformer‑based architectures for NLP, CV, ASR or other domains.
- Applied NLP technology in production environments.
- Proficiency with Dev Ops tools including Docker, Kubernetes, and Singularity.
- Understanding of HPC systems: data center design, high‑speed interconnect Infini Band, cluster storage and scheduling, or related management experience.
NVIDIA offers highly competitive salaries and a comprehensive benefits package. For more details, visit
NVIDIA is committed to fostering a diverse work environment and is an equal‑opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.
Seniority level- Mid‑Senior level
- Full‑time
- Computer Hardware Manufacturing, Software Development, and Computers and Electronics Manufacturing
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: