Deep Learning Solutions Architect – Inference Optimization Job Zurich area,Zürich Kanton Zürich Switzerland,IT/Tech

Location: Zürich

Deep Learning Solutions Architect – Inference Optimization

NVIDIA’s Worldwide Field Operations (WWFO) team is seeking a Solution Architect with a deep understanding of neural network inference. The role involves guiding customers on advanced inference techniques such as speculative decoding, request scheduler optimizations, or FP4 quantization, and leveraging tools like TRT LLM, vLLM, SGLang. The ideal candidate will have strong systems knowledge to help customers fully use new NVL
72 systems and optimize inference pipelines for hybrid or diffusion models.

What You Will Be Doing

Work directly with key customers to understand their technology and provide the best AI solutions.
Perform in‑depth analysis and optimization to ensure the best performance on GPU architecture systems, especially Grace/ARM based systems, including large‑scale inference pipeline optimization.
Partner with Engineering, Product and Sales teams to develop and plan optimal solutions for customers. Enable development and growth of product features through customer feedback and proof‑of‑concept evaluations.

What We Need To See

Excellent verbal, written communication, and technical presentation skills in English.
MS/PhD or equivalent experience in Computer Science, Data Science, Electrical/Computer Engineering, Physics, Mathematics, or other Engineering fields.
5+ years of work or research experience with Python/C++/other software development.
Work experience and knowledge of modern NLP, including transformer, state space, diffusion or MOE model architectures. Expertise in training or optimization/compression/operation of DNNs is preferred.
Understanding of key libraries used for NLP/LLM training such as Megatron‑LM, NeMo, Deep Speed, and deployment libraries such as Tensor

RT‑LLM, vLLM, Triton Inference Server.
Enthusiastic about collaborating with various teams and departments—such as Engineering, Product, Sales, and Marketing—and thrives in dynamic environments.
Self‑starter with a growth mindset, passion for continuous learning and sharing findings across the team.

Ways To Stand Out From The Crowd

Demonstrated experience in running and debugging large‑scale distributed deep learning training or inference processes.
Experience working with larger transformer‑based architectures for NLP, CV, ASR or other domains.
Applied NLP technology in production environments.
Proficiency with Dev Ops tools including Docker, Kubernetes, and Singularity.
Understanding of HPC systems: data center design, high‑speed interconnect Infini Band, cluster storage and scheduling, or related management experience.

NVIDIA offers highly competitive salaries and a comprehensive benefits package. For more details, visit

NVIDIA is committed to fostering a diverse work environment and is an equal‑opportunity employer. We do not discriminate on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law.

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Computer Hardware Manufacturing, Software Development, and Computers and Electronics Manufacturing

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language