×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Engineer – Model Optimization & Quantization

Job in Santa Clara, Santa Clara County, California, 95053, USA
Listing for: Qualcomm
Full Time position
Listed on 2026-06-26
Job specializations:
  • IT/Tech
    Machine Learning/ ML Engineer, AI Engineer (Applied/Software), Data Scientist
Salary/Wage Range or Industry Benchmark: 158400 - 237600 USD Yearly USD 158400.00 237600.00 YEAR
Job Description & How to Apply Below
Position: Staff Machine Learning Engineer – Model Optimization & Quantization

Overview

Join the Qualcomm AI Hub team and help developers integrate machine learning into their products and experiences:  In this role you will develop tools to help developers optimize and deploy machine learning models on edge and mobile hardware. AIMET is Qualcomm's open-source library for state-of-the-art model quantization and compression techniques. You will develop and support cutting-edge model optimization workflows — pushing the boundary of what is possible on resource-constrained hardware.

Applications range from quantizing large language models (LLMs) and generative AI models to compressing latency-critical vision, audio, and multimodal networks for deployment on Qualcomm Snapdragon and other edge SoCs.

We are seeking a talented and motivated Staff Software Engineer with expertise in optimizing and deploying ML models — especially for edge devices.

What You’ll Do
  • Design, develop, and maintain quantization algorithms and compression pipelines within the AIMET framework (PTQ, QAT, mixed-precision, Ada Scale, etc.)
  • Implement advanced quantization techniques including weight-only quantization, activation quantization, KV-cache quantization, and sub-4-bit quantization for LLMs and generative AI models
  • Build tooling to analyze, profile, and debug model accuracy degradation caused by quantization
  • Integrate AIMET workflows with popular ML frameworks — PyTorch and ONNX
  • Develop APIs and developer-facing tooling to make AIMET accessible and easy to use for external customers and design partners
  • Integrate AIMET in AI Hub Workbench Quantize job to enable quantization at large scale
  • Own end-to-end quantization and optimization of models published on Qualcomm AI Hub, ensuring they meet accuracy, latency, and power targets on Qualcomm hardware
  • Quantize and validate a broad range of model families — vision transformers, LLMs, diffusion models, speech, and multimodal architectures — for deployment via AI Hub
  • Develop and maintain automated quantization pipelines and evaluation harnesses to scale model onboarding across AI Hub’s growing model catalog
Minimum Qualifications
  • …Bachelor’s degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of relevant experience…
  • …Master’s degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of relevant experience…
  • …PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of relevant experience…
Preferred Qualifications
  • 3+ years of industry experience in machine learning, deep learning, or AI infrastructure
  • Strong proficiency in Python, with hands‑on experience in PyTorch, ONNX and/or Tensor Flow
  • Solid understanding of neural network architectures — CNNs, Transformers, LLMs, diffusion models, multimodal models
  • Experience with model quantization techniques — PTQ, QAT, weight-only quantization, mixed-precision, sub-4-bit methods
  • Hands‑on experience quantizing LLMs (GPT, LLaMA, Mistral, Falcon, or similar families) for inference optimization
  • Familiarity with AIMET, GPTQ, AWQ, Smooth Quant, or similar frameworks is a strong plus
  • Experience working with ONNX, TFLite/LiteRT, or other model interchange formats
  • Understanding of hardware constraints: memory bandwidth, compute precision (INT4/INT8/FP16/BF16), and NPU/DSP execution
  • Experience collaborating across teams or BUs to drive technical alignment and model delivery
  • Proficiency with git and software development best practices
  • Strong written and verbal communication skills — ability to write clean APIs, documentation, and engage directly with external developers
  • Experience with C++ for performance-critical components is a bonus
  • Familiarity with ARM processors and mobile SoC architecture (Snapdragon) is a plus
  • Experience with automated evaluation pipelines and model benchmarking at scale is a plus
Level of Responsibility
  • Works independently with minimal supervision
  • Provides technical guidance and mentorship to other team members
  • Decision-making is significant and affects work beyond the immediate team
  • Requires strong communication skills to convey complex quantization concepts to varied audiences — from hardware engineers and BU partners to…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary