×
Register Here to Apply for Jobs or Post Jobs. X

Principal Site Reliability Engineer

Remote / Online - Candidates ideally in
Aurora, Arapahoe County, Colorado, 80012, USA
Listing for: Lumen Technologies
Remote/Work from Home position
Listed on 2025-12-18
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

Press Tab to Move to Skip to Content Link

Lumen connects the world. We are igniting business growth by connecting people, data and applications – quickly, securely, and effortlessly. Together, we are building a culture and company from the people up – committed to teamwork, trust and transparency. People power progress.

We’re looking for top-tier talent and offer the flexibility you need to thrive and deliver lasting impact. Join us as we digitally connect the world and shape the future.

The Role

We are looking for a Senior Site Reliability Engineer (SRE)/ Platform Engineer / Dev Ops Engineer with deep expertise in Kubernetes to design, implement, and manage high-availability, scalable systems primarily on AWS EKS. In this role, you will leverage tools like Terraform, ArgoCD, and Git Hub Actions to automate infrastructure and workflows while implementing progressive deployment practices (e.g., blue-green, canary, or feature flagging).

This position requires someone who can troubleshoot complex systems, implement robust monitoring and guardrails for databases and applications, and maintain a focus on optimizing performance, reliability, and cost-efficiency.

Location

This role is designated as a fully remote position within the United States.

The Main Responsibilities
  • Kubernetes Management & Troubleshooting:Design and manage Kubernetes clusters (AWS EKS) with a focus on networking, scalability, security, and reliability. Troubleshoot complex, cross-system issues involving Kubernetes, databases, networking, and cloud infrastructure. Implement and maintain guardrails to ensure consistent and secure operation of Kubernetes workloads.
  • Infrastructure Design & Automation:Architect, build, and maintain highly available, fault-tolerant systems using AWS services. Use Terraform to define infrastructure as code, enabling scalable, repeatable, and secure deployments. Automate provisioning, configuration, and updates for cloud infrastructure with a focus on Git Ops principles using ArgoCD and Git Hub Actions.
  • System Guardrails & Application Monitoring:Set up and enforce guardrails for databases, infrastructure, and applications, ensuring consistency and adherence to best practices. Implement robust application and infrastructure monitoring using tools like Prometheus, Grafana, and potentially Datadog. Ensure proactive alerting and predictive monitoring to detect issues before they impact users.
  • Progressive Deployment & CI/CD:Design and implement deployment strategies like blue-green deployments, canary releases, and feature-flag-based rollouts. Develop and maintain CI/CD pipelines to streamline application delivery, testing, and deployment.
  • Collaboration & Best Practices:Partner with development teams to embed reliability and security best practices into the application lifecycle. Drive a culture of operational excellence, ensuring teams build for reliability, scalability, and security from the ground up.
  • Resilience & Continuous Improvement:Conduct post-incident reviews to identify root causes and prevent future incidents. Implement practices like chaos engineering to test and enhance system resilience.
  • Networking & Security:Design and manage secure networking solutions, including AWS VPCs, Kubernetes networking, and firewalls. Ensure compliance with security best practices and industry standards.
What We Look For in a Candidate

Required Qualifications:

  • 10+ years of related experience in software development, systems engineering, and/or networking
  • Kubernetes Expertise - Deep hands-on experience managing Kubernetes clusters (AWS EKS or similar) with a focus on networking, scaling, and security. Strong troubleshooting skills across Kubernetes workloads, infrastructure, and networking.
  • Infrastructure as Code & Automation - Expertise in Terraform for infrastructure as code. Proven experience with ArgoCD and Git Hub Actions for Git Ops workflows and CI/CD pipelines.
  • Monitoring & Observability - Proficiency in Prometheus, Grafana, and incident management workflows. Experience implementing application-level monitoring and tracing to identify performance bottlenecks.
  • Guardrails & System Security - Demonstrated ability to set up…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary