Machine Learning Engineer – Model Optimization & Quantization
Listed on 2026-06-26
-
IT/Tech
Machine Learning/ ML Engineer, AI Engineer (Applied/Software), Data Scientist
Overview
Join the Qualcomm AI Hub team and help developers integrate machine learning into their products and experiences: In this role you will develop tools to help developers optimize and deploy machine learning models on edge and mobile hardware. AIMET is Qualcomm's open-source library for state-of-the-art model quantization and compression techniques. You will develop and support cutting-edge model optimization workflows — pushing the boundary of what is possible on resource-constrained hardware.
Applications range from quantizing large language models (LLMs) and generative AI models to compressing latency-critical vision, audio, and multimodal networks for deployment on Qualcomm Snapdragon and other edge SoCs.
We are seeking a talented and motivated Staff Software Engineer with expertise in optimizing and deploying ML models — especially for edge devices.
What You’ll Do- Design, develop, and maintain quantization algorithms and compression pipelines within the AIMET framework (PTQ, QAT, mixed-precision, Ada Scale, etc.)
- Implement advanced quantization techniques including weight-only quantization, activation quantization, KV-cache quantization, and sub-4-bit quantization for LLMs and generative AI models
- Build tooling to analyze, profile, and debug model accuracy degradation caused by quantization
- Integrate AIMET workflows with popular ML frameworks — PyTorch and ONNX
- Develop APIs and developer-facing tooling to make AIMET accessible and easy to use for external customers and design partners
- Integrate AIMET in AI Hub Workbench Quantize job to enable quantization at large scale
- Own end-to-end quantization and optimization of models published on Qualcomm AI Hub, ensuring they meet accuracy, latency, and power targets on Qualcomm hardware
- Quantize and validate a broad range of model families — vision transformers, LLMs, diffusion models, speech, and multimodal architectures — for deployment via AI Hub
- Develop and maintain automated quantization pipelines and evaluation harnesses to scale model onboarding across AI Hub’s growing model catalog
- …Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of relevant experience…
- …Master’s degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of relevant experience…
- …PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of relevant experience…
- 3+ years of industry experience in machine learning, deep learning, or AI infrastructure
- Strong proficiency in Python, with hands‑on experience in PyTorch, ONNX and/or Tensor Flow
- Solid understanding of neural network architectures — CNNs, Transformers, LLMs, diffusion models, multimodal models
- Experience with model quantization techniques — PTQ, QAT, weight-only quantization, mixed-precision, sub-4-bit methods
- Hands‑on experience quantizing LLMs (GPT, LLaMA, Mistral, Falcon, or similar families) for inference optimization
- Familiarity with AIMET, GPTQ, AWQ, Smooth Quant, or similar frameworks is a strong plus
- Experience working with ONNX, TFLite/LiteRT, or other model interchange formats
- Understanding of hardware constraints: memory bandwidth, compute precision (INT4/INT8/FP16/BF16), and NPU/DSP execution
- Experience collaborating across teams or BUs to drive technical alignment and model delivery
- Proficiency with git and software development best practices
- Strong written and verbal communication skills — ability to write clean APIs, documentation, and engage directly with external developers
- Experience with C++ for performance-critical components is a bonus
- Familiarity with ARM processors and mobile SoC architecture (Snapdragon) is a plus
- Experience with automated evaluation pipelines and model benchmarking at scale is a plus
- Works independently with minimal supervision
- Provides technical guidance and mentorship to other team members
- Decision-making is significant and affects work beyond the immediate team
- Requires strong communication skills to convey complex quantization concepts to varied audiences — from hardware engineers and BU partners to…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).