AI Model Optimization Architect Job San Diego area,California USA,IT/Tech

Company

Qualcomm Technologies, Inc.

Job Area

Engineering Group, Engineering Group >
Machine Learning Engineering

General Summary

Qualcomm is leveraging its strengths in compute, connectivity, and AI acceleration to play a central role in the evolution of Cloud AI. The Qualcomm Cloud AI team develops hardware and software platforms enabling efficient inference of large-scale foundation models.

Position Overview

We are seeking a Staff Engineer – AI Model Optimization Architect to lead end-to-end model transformation and optimization for LLMs, VLMs, diffusion, and multimodal models on Qualcomm inference accelerators. This role works closely with compiler, performance, and accuracy teams to translate models into accelerator efficient execution while balancing throughput, latency, memory, and quality. The scope spans Day0 enablement through production deployment, with a strong emphasis on scaling optimizations to future architectures.

Key Responsibilities

Architect and deliver model optimization strategies that transform PyTorch models for efficient inference on Qualcomm accelerators.
Drive graph capture and deployment using PyTorch, ONNX, and torch.compile, including model rewrites and graph-level transformations.
Design and implement fusion kernels using DSL based approaches (e.g., Triton), enabling fused operations and performance critical algorithmic rewrites.
Partner deeply with compiler, performance, and accuracy teams to co-design lowering strategies, kernel fusion, layout decisions, and runtime integration.
Profile and optimize LLM/VLM/diffusion inference for throughput and latency across batch sizes, sequence lengths, and serving modes.
Own transformer specific optimizations including KVcache management, decoding behavior, and long context performance.
Enable and optimize continuous batching (dynamic/iteration-level scheduling), understanding its impact on memory, scheduling, and tail latency.
Architect and scale distributed inference strategies (e.g., sharding and parallelism) across multi-core and multi-device systems.
Establish reusable approaches to scale model optimizations to new hardware architectures, creating robust patterns and tooling.
Debug complex performance or stability issues to root cause and drive production ready solutions.

Required Qualifications

Expert level expertise in PyTorch and inference focused model optimization; strong Python engineering skills.
Hands on experience with torch.compile / Torch Dynamo or related graph capture and compilation workflows.
Deep understanding of transformer architectures, attention mechanisms, MoEs, and performance trade-offs.
Practical experience with KVcache behavior, serving time optimizations, and memory/performance tradeoffs.
Strong foundation in computer architecture, ML accelerators, and distributed systems.
Proven ability to lead cross-functional technical efforts and influence design decisions.
MS in Computer Science, Machine Learning, Computer Engineering, or Electrical Engineering, or equivalent experience.

Preferred / Bonus Qualifications

Experience developing fusion kernels using Triton or similar DSLs, and collaborating with ML compiler teams.
Familiarity with LLM serving stacks and continuous batching systems.
Background in numerical methods, performance/accuracy trade-off analysis, or evaluation frameworks.
PhD in a relevant field.

Minimum Qualifications

Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 4+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
Master's degree in Computer Science, Engineering, Information Systems, or related field and 3+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
PhD in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.

Pay Range and Benefits

$ – $. Salary is one component of total compensation. Competitive annual discretionary bonus program, opportunity for annual RSU grants. Highly competitive benefits package. For more details, refer to Qualcomm U.S. benefits information.

Equal Opportunity Employer

Qualcomm is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or any other protected classification. Qualcomm is committed to providing reasonable accommodations for individuals with disabilities.

#J-18808-Ljbffr