×
Register Here to Apply for Jobs or Post Jobs. X

Sr Software Development Engineer, Neuron Collectives, Annapurna Labs

Job in Cupertino, Santa Clara County, California, 95014, USA
Listing for: Amazon
Full Time position
Listed on 2026-06-06
Job specializations:
  • IT/Tech
    AI Engineer, Machine Learning/ ML Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Sr Software Development Engineer, Neuron Collectives, Annapurna Labs

Annapurna Labs is an integral part of AWS and develops hardware and software components that are critical building blocks for EC2 infrastructure. We specialize in designing software, systems and chips that optimize the AWS customer experience. The AWS Neuron Collectives team is seeking a Software Engineer to optimize collective operations for AWS Trainium, one of Amazon's highest priority initiatives powering the frontier AI models being trained today.

Collectives are the critical operations that scale AI compute across the data center. You will work in depth to optimize compute for the specific topologies used to train modern LLMs, pushing for maximum performance using C/C++, interfacing with DMA and firmware, and investigating detailed topologies. You will analyze current collective algorithms with publicly accessible tools like Neuron Explorer and optimize these to fully utilize compute and bus bandwidth to scale across the data center.

This is a unique opportunity to impact how AI training runs at AWS scale while growing your technical breadth and depth.

About the team

Annapurna Labs created Trainium as a purpose-built AI training chip to revolutionize machine learning at Amazon scale. The Neuron Collectives team owns the software stack that enables collective operations—the communication primitives that allow AI training to scale across thousands of chips in the data center. Our work is essential to training the frontier models that power AI today, and we collaborate closely with hardware teams to extract maximum performance from Trainium, ensuring that compute and interconnect bandwidth are fully utilized.

Key

job responsibilities
  • Enhance collective algorithms and topologies for optimal training performance
  • Use tools like Neuron Explorer to identify bottlenecks in compute and bus bandwidth utilization
  • Monitor and analyze processor, DMA, firmware, and workload metrics
  • Optimize collective operations to scale AI compute across the data center
  • Work closely with the hardware team to co‑optimize software and Trainium silicon
  • Develop and optimize C/C++ implementations of collective communication patterns
  • Investigate and implement improvements for specific training topologies used by modern LLMs
  • Build and maintain analysis frameworks and automation solutions
Qualifications
  • Bachelor's degree in computer science or equivalent
  • 5+ years of experience building complex software systems that have been successfully delivered to customers
  • 5+ years of experience contributing to the architecture and design (architecture, design patterns, reliability and scaling) of new and current systems
  • Master's degree in computer science or equivalent
  • Familiarity with collective communication algorithms (e.g., all‑reduce, all‑gather) or distributed training frameworks
Equal Opportunity Employer

Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.

Los Angeles County Applicants

Job duties for this position include working safely and cooperatively with other employees, supervisors, and staff; adhering to standards of excellence despite stressful conditions; communicating effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service; and following all federal, state, and local laws and Company policies. Criminal history may have a direct, adverse, and negative relationship with some of the material job duties of this position.

These include the duties and responsibilities listed above, as well as the abilities to adhere to company policies, exercise sound judgment, effectively manage stress and work safely and respectfully with others, exhibit trustworthiness and professionalism, and safeguard business operations and the Company’s reputation. Pursuant to the Los Angeles County Fair Chance Ordinance, we will consider for employment qualified applicants with arrest and conviction records.

Benefits

The base salary range for this position is listed below. Your Amazon package will include sign‑on payments and restricted stock units (RSUs). Final compensation will be determined based on factors including experience, qualifications, and location. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage), 401(k) matching, paid time off, and parental leave.

Learn more about our benefits at (Use the "Apply for this Job" box below)..

Location and Salary

USA, CA, Cupertino -  -  USD annually

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary