Senior Chaos Engineer
Listed on 2026-02-13
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
Saint Louis, United States | Posted on 11/21/2025
We are looking for a Senior Chaos Engineer with strong experience in testing cloud-based applications and infrastructure using chaos engineering practices. The ideal candidate will design and execute experiments that validate system resilience, reliability, and fault-tolerance across distributed cloud environments. This role requires deep knowledge of chaos tools, cloud platforms, and testing strategies for highly available systems.
RequirementsResponsibilities:
Design and execute chaos experiments to test system resilience, recovery, and reliability.
Use tools like Gremlin
, Chaos Monkey
, Kube Monkey
, or similar platforms to inject failures.
Validate distributed applications, microservices, and cloud services under stress, latency, and failure scenarios.
Develop automated test plans and resilience test suites.
Analyze experiment results and provide actionable recommendations to improve system stability.
Work closely with SRE, Dev Ops, Cloud, and engineering teams to integrate chaos testing into CI/CD.
Ensure best practices for fault-injection, observability, and high-availability architectures.
Mentor junior engineers and guide resilience engineering strategies.
Required
Skills & Experience:
5+ years of experience in SRE/Dev Ops/Chaos Engineering or similar roles.
Strong hands-on experience with Chaos Monkey, Gremlin, Kube Monkey, Litmus
, or equivalent chaos engineering tools.
Solid understanding of cloud platforms (AWS, Azure, or GCP) and distributed system design.
Strong skills in Linux, containers, Kubernetes, and microservices testing.
Experience in observability tools (Prometheus, Grafana, Splunk, ELK).
Good knowledge of automation and scripting (Python, Bash, or Go).
Excellent debugging, fault-analysis, and problem-solving skills.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).