Site Reliability Engineer
Listed on 2025-12-19
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
Optomi, in partnership with a leading technology operations center, is looking for an SRE – Cloud Platform to join their team in Plano, TX.
6 month contract to hire
Onsite in Plano, TX 4x/week
Position Summary: The SRE – Cloud Platform will be focused on operating and automating scalable, resilient AWS infrastructure. Working with core AWS services such as EKS, Lambda, Cloud
WAN, ECR, and Systems Manager, this role will drive self-healing automation, observability, and CI/CD pipeline integration. The role embodies SRE best practices to ensure reliability, performance, and operational excellence of cloud-native platforms supporting business-critical applications. This position will collaborate closely with Cloud Platform Development Teams, Production Engineering, and Major Incident Management teams to resolve production issues and improve infrastructure.
- Opportunity to work with cutting-edge AWS technologies.
- Collaborative and cross-functional team environment.
- Focus on automation, scalability, and operational excellence.
- Solid understanding of SRE concepts: SLIs, SLOs, error budgets, incident response.
- Strong hands-on experience with AWS services such as EKS, Lambda, Cloud
WAN, and Systems Manager. - Experience with infrastructure-as-code tools like Terraform and Cloud Formation.
- Proficiency in scripting languages such as Python, Bash, or Power Shell.
- Familiarity with Dev Ops tools like Git Hub, Harness, and Dynatrace.
- Build and maintain components required to automate and self-heal AWS infrastructure.
- Develop and maintain infrastructure as code (IaC) using Terraform for scalable and repeatable deployments.
- Manage container orchestration platforms and related cloud-native services.
- Define and measure SLIs/SLOs, error budgets, and drive reliability improvements.
- Implement monitoring and observability using Dynatrace and AWS native services like Cloud Watch.
- Participate in incident management, on-call rotations, and lead blameless postmortems.
- Collaborate cross-functionally to embed SRE principles into cloud platform design and operation.
- Troubleshoot network issues and manage cloud routing.
- Certifications like AWS Certified Dev Ops Engineer or AWS Certified Solutions Architect.
- Knowledge of integration tools and technologies like Mule Soft, Camel, and message streaming services.
Mid-Senior level
Employment TypeFull-time
Job FunctionInformation Technology
IndustriesIT Services and IT Consulting
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).