Sr Software Engineer, AI Tools Device Generative AI Model Optimization
Job in
San Diego, San Diego County, California, 92189, USA
Listed on 2026-06-26
Listing for:
Qualcomm
Full Time
position Listed on 2026-06-26
Job specializations:
-
IT/Tech
AI Engineer (Applied/Software), Machine Learning/ ML Engineer
Job Description & How to Apply Below
Company
Qualcomm Technologies, Inc.
Job AreaEngineering Group:
Machine Learning Engineering
As a leading technology innovator, Qualcomm pushes the boundaries of what’s possible to enable next‑generation AI experiences and drive agentic transformation, creating a smarter, connected future for all. As a Qualcomm Machine Learning Engineer, you will develop and implement cutting‑edge tools and solutions to enable state‑of‑the‑art AI solutions across various technology verticals.
All Qualcomm employees are expected to actively support diversity on their teams and within the Company.
LocationThis role is open to both San Diego, CA and Raleigh, NC and will be onsite full‑time.
What You’ll DoModel Reauthoring & Architecture Adaptation- Reauthor generative AI architectures for efficient execution on Qualcomm AI hardware. This covers LLMs (Llama, Phi, Qwen) and multimodal models (vision‑language, speech, diffusion), including custom attention, normalization, positional embedding, and modality‑specific components.
- Translate hardware execution constraints — operator support, memory layout, dispatch behavior — into model‑level transformations. These transformations need to preserve accuracy while enabling efficient on‑device execution.
- Build clean extension points so internal teams and external contributors can onboard new architectures without changing core pipeline code.
- Integrate inference acceleration techniques into the model preparation pipeline. This includes memory‑efficient attention, decode acceleration, and serving‑time optimizations.
- Translate end‑customer deployment constraints — target SoC, context length, latency budget, memory envelope — into concrete model preparation strategies.
- Work with research teams to develop reauthoring strategies for custom OEM models and customer‑specific use cases. Take research prototypes and turn them into production deployments.
- Partner with compiler teams to understand on‑target constraints. Decide on the right response: a graph‑level optimization or model‑level reauthoring.
- Partner with quantization engineers so architectural decisions compose cleanly with the quantization stack.
- Contribute reauthoring and adaptation stages to a multi‑stage model preparation pipeline. Build developer‑facing diagnostics that give clear, actionable feedback when models fail to lower or run efficiently.
- Bachelor’s degree in Computer Science, Engineering, or related field and 4+ years of Software Engineering, ML Engineering, or related experience.
- OR Master’s degree in Computer Science, Engineering, or related field and 3+ years of relevant experience.
- OR PhD in Computer Science, Engineering, or related field and 2+ years of relevant experience.
- 2+ years in ML systems, model optimization, or inference engineering.
- Proficient in Python in large, typed codebases.
- Strong written and verbal communication. Comfortable operating across compiler, research, and partner‑facing teams.
- Deep implementation‑level knowledge of generative AI architectures across LLMs and multimodal models.
- Demonstrated experience optimizing inference for edge or resource‑constrained deployments, with measurable latency or memory wins to point to.
- Strong PyTorch internals knowledge — module customization, export flows, tracing. Familiarity with the Hugging Face transformers ecosystem.
- Familiarity with on‑device runtimes and SoC‑level constraints (memory bandwidth, compute precision, NPU/DSP execution). Exposure to QAIRT/QNN, ONNXRuntime, LiteRT‑LLM or similar is a plus.
- Working understanding of how quantization interacts with model architecture decisions, even if you’re not a quantization specialist.
- Experience using agentic coding tools such as Git Hub Copilot, Cursor, Claude Code, Codeium, or similar AI‑assisted development tools to improve coding productivity and problem‑solving.
- Works independently on open‑ended optimization challenges. Provides technical guidance and mentorship to teammates.
- Decisions have broad impact on model…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×