Software Development Engineer — CI/CD, Trainium Manufacturing Test Infrastructure
Listed on 2026-06-05
-
IT/Tech
Systems Engineer, Cloud Computing
Software Development Engineer — CI/CD, Trainium Manufacturing Test Infrastructure
The Manufacturing Infrastructure Release Team within Annapurna ML builds and operates the software platform that orchestrates hardware testing and validation across multiple Trainium manufacturing sites worldwide. Our platform deploys containerized microservices to AWS Outposts at manufacturing partner factories, enabling component-level, card/board, server-level, and rack-level testing directly enable the manufacturing ramp of AWS's custom AI training chips.
Key Job Responsibilities- Design, build, and maintain CI/CD pipelines (AWS CDK, Pipelines) that deploy containerized services to AWS Outposts at global manufacturing sites.
- Extend the manufacturing infrastructure platform (Type Script CDK, Python microservices) to support new workflows for Trainium accelerator cards, baseboards, and rack-level integration.
- Build integration test frameworks and canary systems that validate service health across all production sites before and after deployments.
- Develop automated alarming, rollback mechanisms, and deployment wave strategies to ensure zero‑downtime releases to active manufacturing lines.
- Develop infrastructure‑as‑code for containerized services, databases, artifact storage, messaging queues, and authentication systems deployed on Outposts.
- Collaborate with Test Engineering teams, Hardware Engineers, and Supply Chain to resolve bottlenecks in the manufacturing process.
Annapurna Labs is a wholly owned subsidiary of AWS, focused on developing custom silicon and servers including the Nitro, Graviton, Inferentia, and Trainium families of processors. Machine Learning Annapurna (MLA) functions as a vertically integrated team including software, firmware, hardware, and silicon design in a single organization. We are the Training Servers and Systems organization under MLA focused on Hardware Development, Software Development, Fleet Ops Systems, and Manufacturing, Quality, and Reliability.
This position is in the Manufacturing, Quality and Reliability team.
- BS degree in computer science or equivalent.
- Experience with at least one general‑purpose programming language such as Java, Python, C++, C#, Go, Rust, or Type Script.
- Experience with CI/CD pipeline design and implementation (AWS Pipelines, Circle
CI, Git Lab CI, Git Hub Actions, Jenkins, or similar). - Experience with cloud services (AWS, GCP, or Azure) — particularly IaC tools such as CDK, Cloud Formation, Terraform, or Pulumi.
- Experience deploying software to edge/hybrid environments (AWS Outposts, on‑premises).
- Experience with containerized microservice architectures (Docker, ECS/EKS, Kubernetes).
- Familiarity with hardware test automation or manufacturing systems.
- Experience with setting up CI/CD for system software.
- Familiarity with network configuration in constrained environments (VPN, CIDR management, site connectivity).
Location:
Cupertino, CA. Salary range: USD 127,100–185,000 annually.
Amazon is an equal opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
The base salary range for this position is USD 127,100–185,000 annually. Amazon also offers comprehensive benefits including health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and optional supplemental life plans), 401(k) matching, paid time off, and parental leave.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).