Software Development Engineer, AWS Resilience, Health Guardian
Listed on 2026-06-02
-
Software Development
Software Engineer, Cloud Engineer - Software, DevOps, AI Engineer
Software Development Engineer, AWS Resilience, Health Guardian
Job : | Amazon Data Services, Inc.
AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain, and we’re looking for talented people who want to help.
You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion.
The Health Guardian team is looking for a software engineer who is excited about building automated detection and mitigation systems that protect AWS infrastructure detect subtle failures that evade traditional health checks and automatically remove affected resources from service before customers are impacted. Our systems run across every AWS region, and we’re scaling coverage from hundreds of services to thousands. This is a hands‑on position where you will design and deliver significant software components, drive cross‑team technical alignment, and mentor other engineers.
You need to be a strong software developer with a track record of delivering, but also excel in communication, technical leadership, and customer focus. You’ll leverage generative AI tools as part of your daily workflow to accelerate design, development, and validation. This is an opportunity to join a small, high‑impact team solving hard reliability problems and help shape both the technology and the direction of automated failure protection across AWS.
responsibilities
- Design and deliver systems that span multiple AWS teams and organizational boundaries.
- Build detection algorithms and experimentation frameworks that validate changes at scale.
- Architect safety mechanisms—circuit breakers, throttling, validation—that let automation scale without unintended customer impact.
- Own ambiguous problems end‑to‑end from design through operations.
- Mentor other engineers and lead technical design reviews.
- Use AI‑assisted development tools to prototype, test, and validate faster.
We are a small team with outsized impact on AWS reliability. We operate what we build, and every engineer has direct visibility into how their code performs during real infrastructure events. We solve complex distributed systems challenges to ensure automated protection works reliably even during the failures it’s designed to detect. We value operational rigor, building systems that are safe by default, and solving hard problems with simple designs.
BasicQualifications
- 3+ years of non‑internship professional software development experience.
- 2+ years of non‑internship design or architecture (design patterns, reliability and scaling) of new and existing systems experience.
- 2+ years of programming with at least one software programming language experience.
- 2+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience.
- Bachelor’s degree in computer science or equivalent.
- Experience in mentoring, leading, or managing more junior engineers.
Amazon is an equal‑opportunity employer and does not discriminate on the basis of protected veteran status, disability, or other legally protected status.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).