Senior Software Engineer - SoC DevOps, MLA-MI - Annapurna Labs
Listed on 2026-05-16
-
Software Development
DevOps, Software Engineer, Cloud Engineer - Software
Senior SoC Software Dev Ops Engineer role centers on enabling the rapid and reliable development of software for AWS’s most advanced custom machine learning chips. This position is critical to supporting the Trainium and Inferentia families of silicon which power large scale AI training engineer will serve as the primary owner of infrastructure that directly affects how quickly software teams can iterate on code for both pre-silicon simulation environments and post-silicon production deployments.
By building robust automation and tooling the role ensures that tape outs for new chips stay on schedule and that software is ready to function immediately when first silicon becomes available. This work has a direct impact on AWS’s ability to deliver advanced ML infrastructure to its largest customers.
This role operates at the intersection of hardware and software requiring deep expertise in infrastructure engineering to solve unique challenges such as coordinating releases across isolated environments and validating firmware on real silicon. It is a foundational position for the SoC software teams as it frees engineers from infrastructure burdens allowing them to focus on feature development. Success in this role will be measured by improvements in development velocity, release quality, and the stability of systems that support multiple teams.
The position demands a proactive approach to identifying bottlenecks and a strong ability to operate within novel technical contexts without prior domain knowledge in machine learning or chip design.
- Own the end-to-end CI/CD pipelines and release processes for all SoC software components including firmware, hardware abstraction layers, and modeling tools. This involves designing, maintaining, and evolving systems that produce reliable releases for both internal verification teams and external AWS services. A key task is ensuring these pipelines function across heterogeneous environments such as corporate networks and VPC.
- Build qualification workflows that guarantee software meets strict quality standards before reaching customers or verification teams.
- Develop hardware-in-the-loop test infrastructure that validates SoC software on actual silicon in laboratory and automated testing settings. This includes creating frameworks to run tests on real chips, simulate pre-silicon environments, and integrate results into continuous integration workflows.
- Build observability tools such as dashboards that track build health, test coverage, and pipeline performance along with alerting systems that notify teams of regressions.
- Identify and remove friction in development workflows such as slow build times or complex release steps using data-driven insights to prioritize improvements that accelerate team productivity.
- Solve novel problems like bridging disconnected environments and orchestrating synchronized releases across multiple domains.
We’re part of the SoC Software organization within Annapurna Labs (AWS). Our three software teams — uCode, HAL (Hardware Abstraction Layer), and Modeling — build the firmware, drivers, and virtual platforms for AWS’s custom ML accelerator chips. We operate like a startup: small teams, high ownership, direct impact on AWS’s most strategic silicon programs. This Dev Ops engineer will work across all three teams, with a mandate to improve velocity, quality, and developer experience for the entire SoC software organization.
Our team is dedicated to supporting new members. We have a broad mix of experience levels and tenures, and we’re building an environment that celebrates knowledge-sharing and mentorship.
- 7+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience.
- Experience as a mentor, tech lead or leading an engineering team.
- Experience programming in Python and at least one of:
Bash, Go, C++, or Java. - Experience with infrastructure-as-code (CDK, Cloud Formation, Terraform, etc.).
- Experience with AWS services (Lambda, S3, EC2, Cloud Watch, IAM, Secrets Manager, etc.).
- Experien…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).