More jobs:
Senior AI Performance Architect
Job in
Raleigh, Wake County, North Carolina, 27601, USA
Listed on 2026-06-28
Listing for:
Nutanix
Full Time
position Listed on 2026-06-28
Job specializations:
-
IT/Tech
AI Engineer (Applied/Software), Systems Engineer, Hardware Engineer, Machine Learning/ ML Engineer -
Engineering
AI Engineer (Applied/Software), Systems Engineer, Hardware Engineer
Job Description & How to Apply Below
Qualcomm Technologies, Inc.
Job Area:
Engineering Group, Engineering Group Machine Learning Engineering General
Summary:
Today, more intelligence is moving to end devices, and mobile is becoming a pervasive AI platform. At the same time, data centers are expanding AI capability through widespread deployment of ML accelerators. Qualcomm envisions making AI ubiquitous - expanding beyond mobile and powering other end devices, data centers, vehicles, and things. We are inventing, developing, and commercializing power-efficient on-device AI, edge cloud AI, data center and 5G to make this a reality.
We are looking for AI Accelerator Architecture Engineers to drive functional, performance and power enhancements into the HW to enable state of the art training capabilities. AI inference and training systems must scale to a large number of accelerators, servers and racks. Our devices must be designed to scale to handle the largest of today's models.
The AI Architecture team is comprised of experts that span the full gamut from software architecture, algorithm development, kernel optimization, down to hardware accelerator block architecture and SOC design. The ideal candidate will augment the team by contributing to one or many of these areas.
Responsibilities:
Understand trends in ML network design through customer engagements and latest academic research and determine how this will affect both SW and HW design
Work with customers to determine hardware requirements for AI training systems
Analysis of current accelerator and GPU architectures
Architect enhancements required for efficient training of AI models
Design and architecture of:
Flexible Computational Blocks Involving a variety of datatypes : floating point, fixed point, microscaling
Involving a variety of precision : 32/16/8/4/2/1
Capable of optimally performing dense and sparse GEMM, GEMVMemory Technology and subystems that are optimized for a range of requirements
Capacity Bandwidth Compute in Memory, Compute near memory
Scale-Out and Scale-Up Architectures Switches, No
Cs, Codesign with Communication Collectives Optimized for Power Ability to perform Competitive Analysis Codesign HW with SW/GenAI (LLM) requirements
Define performance models to prove effectiveness of architecture proposals
Pre-Silicon prediction of performance for various ML training workloads
Perform analysis of performance/area/power trade-offs for future HW and SW ML algorithms including impact of SOC components (memory and bus impacts)
Requirements:
Master's degree in Computer Science, Engineering, Information Systems, or related field3+ years Hardware Engineering experience defining architecture of GPUs or accelerators used for training of AI models
In-depth knowledge of nVidia/AMD GPU capabilities and architectures
Knowledge of LLM architectures and their HW requirements
Preferred
Skills and Experience:
Knowledge of computer architecture, digital circuits and hardware simulators
Knowledge of communication protocols used in AI systems
Knowledge of Network-on-Chip (NoC) designs used in System-on-Chip (SoC) designs
Understanding of various memory technologies used in AI systems
Experience in modeling hardware and workloads in order to extract performance and power estimates
High-level hardware modeling experience preferred
Knowledge of AI Training systems such as NVIDIA DGX and NVL
72
Experience training and fine tuning LLMs using distributed training framework such as Deep Speed, FSDPKnowledge of front-end ML frameworks (i.e.,Tensor Flow, PyTorch) used for training of ML models
Strong communication skills (written and verbal)
Detail-oriented with strong problem-solving, analytical and debugging skills
Demonstrated ability to learn, think and adapt in a fast-changing environment
Ability to code in C++ and Python Knowledge of a variety of classes of ML models (i.e. CNN, RNN, etc)
Minimum Qualifications:
Bachelor's degree in Computer Science, Engineering, Information Systems, or related field and 2+ years of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
ORMaster's degree in Computer Science, Engineering, Information Systems, or related field and 1+ year of Hardware Engineering, Software Engineering, Systems Engineering, or related work experience.
ORPhD in Computer Science, Engineering, Information Systems, or related field.
Qualcomm is an equal opportunity employer. If you are an individual with a disability and need an accommodation during the application/hiring process, rest assured that Qualcomm is committed to providing an accessible process. You may e-mail disabili or call Qualcomm's toll-free number found here . Upon request, Qualcomm will provide reasonable accommodations to support individuals with disabilities to be able participate in the hiring process.
Qualcomm is also committed to making our workplace accessible for individuals with disabilities. (Keep in mind that this email address is used to provide reasonable accommodations for individuals with disabilities.…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×