Lead Site Reliability Engineer; SRE - Cloud/DevOps Engineer Job Des Moines area,Iowa USA,IT/Tech

Position: Lead Site Reliability Engineer (SRE) - Cloud/DevOps Engineer-

About Lumen

Lumen connects the world. We are igniting business growth by connecting people, data and applications – quickly, securely, and effortlessly. Together, we are building a culture and company from the people up – committed to teamwork, trust and transparency. People power progress. We’re looking for top-tier talent and offer the flexibility you need to thrive and deliver lasting impact. Join us as we digitally connect the world and shape the future.

The Role

We are seeking a highly skilled and proactive Site Reliability Engineer (SRE) to join our team, focusing on production support and performance optimization across our portal ecosystem. This role is critical to ensuring the reliability, scalability, and efficiency of our systems, with a strong emphasis on AWS infrastructure, observability, and automation.

The SRE also understands the software development lifecycle (from coding to support) and understands various automation tools for developing CI/CD pipelines. This role will shape how Lumen combines the latest technologies and services to automate all aspects of software deployment and application lifecycle management. Passionate about software automation and quality is always a priority.

This role will collaborate with key stakeholders across the engineering organization product owners, developers, and testers to design and optimize and automate business and technical processes.

The Main Responsibilities

Production Support & Incident Management

Provide Tier 2/3 support for issues across portal services by troubleshooting and resolving technical issues in test and production environments.
Lead root cause analysis and post-mortem processes to ensure continuous improvement.

Performance Optimization

Monitor system performance and proactively identify bottlenecks or degradation.
Implement tuning strategies across application layers, databases, and infrastructure.
Drive initiatives to improve latency, throughput, and resource utilization.

Monitoring & Observability

Design and maintain dashboards, alerts, and metrics using tools like Cloud Watch, Grafana, or similar.
Ensure comprehensive coverage of system health indicators and business KPIs.

Automation & Infrastructure as Code

Develop and maintain automation scripts and tools for deployment, scaling, and recovery.

Use Terraform, or similar IaC tools to manage AWS resources.
Automate routine operational tasks to improve efficiency and reduce human error.

Reliability Engineering

Champion SRE principles such as SLIs, SLOs, and error budgets.
Participate in reliability reviews.
Advocate for resilient architecture and fault-tolerant design patterns.

Collaboration & Communication

Work closely with software engineers, Dev Ops, and product teams to align reliability goals.
Document processes, runbooks, and best practices for knowledge sharing.
Provide mentorship and guidance on reliability and operational excellence.
Create and maintain detailed technical documentation for software solutions. Stay up to date on the latest software engineering trends and technologies

What We Look For in a Candidate

US Citizen on US soil.
5+ years Java /Microservice Architecture?
5+ years overall professional experience in SRE, Dev Ops, or infrastructure engineering roles.
Experience with Terraform, or similar IaC tools to manage Cloud resources.
Proficiency in scripting languages (Python, Bash, etc.) and automation frameworks.
Experience with CI/CD pipelines and tools like Jenkins, Git Hub Actions, or Git Lab CI.
Solid understanding of monitoring and logging tools (e.g., Cloud Watch, ELK, Datadog).
Familiarity with containerization and orchestration (Docker, Kubernetes).
Excellent problem-solving skills and a proactive mindset.

Preferred Requirements

Experience in AWS services (EC2, Cloud Front, EKS, RDS, S3, etc.).
Certifications in AWS or related technologies are a plus.
Experience of application development using Java Microservices and Spring Boot framework
Experience in frontend development (Java script/Typescript, and frameworks such as VueJS)
Experience with Agile/SCRUM Methodologies and development practices
Frontend tech stack hands on experience will be a good to have skill.

Compensation

This…


Increase/decrease your Search Radius (miles)



Job Posting Language

Lead Site Reliability Engineer; SRE - Cloud​/DevOps Engineer

Lead Site Reliability Engineer; SRE - Cloud/DevOps Engineer