DevOps Engineer - SRE and SaaS
Job in
Englewood, Arapahoe County, Colorado, 80112, USA
Listed on 2026-05-21
Listing for:
Kaav, Inc.
Full Time
position Listed on 2026-05-21
Job specializations:
-
IT/Tech
Systems Engineer, IT Support, SRE/Site Reliability, Cloud Computing
Job Description & How to Apply Below
Opening / Selling Statement -We are seeking a Mid-Level Dev Ops Engineer with Site Reliability Engineering (SRE) experience to contribute to the transition of Crew Management Applications to a web-based SaaS model hosted on AWS. The successful candidate will work under the guidance of a Senior Dev Ops Engineer, supporting critical system reliability, automation, and monitoring tasks while actively contributing to the successful implementation of key deliverables.
Required Skills -Dev Ops, Site Reliability Engineering (SRE), Kubernetes, AWS EKS
Job Duties -- Support Key Deliverables:
Assist in implementing metrics collection, developing dashboards, conducting reliability audits, and creating runbooks as outlined in the project goals.
- Collaboration:
Work closely with the Senior Dev Ops Engineer, development teams, and support teams to ensure seamless operations and effective communication between stakeholders.
- CI/CD and Automation:
Contribute to the development and optimization of CI/CD pipelines and automation scripts to support efficient and consistent deployments.
- Observability Implementation:
Assist in configuring and maintaining monitoring solutions using Open Telemetry and Grafana to enhance system visibility.
- Production Support:
Participate in 24/7 Tier II production support on a rotational basis, addressing technical escalations and contributing to system stability.
- Documentation:
Collaborate in the preparation of technical documentation, including runbooks, playbooks, and training materials for Tier I and II support teams.
- Dashboards and Metrics:
Support the development of Grafana dashboards for monitoring services, including Kubernetes platform components and internally developed services.
- Issue Investigation:
Assist in identifying and resolving issues reported from lower-tier support teams, ensuring timely resolution and minimizing customer impact.
- Game Day Scenarios:
Participate in the execution of Game Day scenarios to prepare for potential system failures and improve operational readiness.
- Reliability Contributions:
Work on tasks related to reliability audits, including submitting merge requests for simpler issues and escalating more complex problems to senior team members.
Job Requirements -
- Experience:
3-5 years in Dev Ops, SRE, or related roles with a focus on cloud-hosted, microservices-based environments.
- Technologies:
Familiarity with Kubernetes, AWS EKS, Terraform, ArgoCD, Open Telemetry, and Grafana.
- Dev Ops Practices:
Basic understanding of CI/CD pipelines and infrastructure-as-code (IaC) principles.
- Incident Management:
Experience in troubleshooting and resolving technical issues in production environments.
- Collaboration:
Ability to work effectively as part of a team under the direction of senior engineers.
- Documentation:
Basic skills in technical writing, including the ability to contribute to incident runbooks and operational playbooks.
- On-Call Readiness:
Willingness to participate in 24/7 rotational production support as required.
Desired Skills & Experience -- Exposure to Git Ops practices and tools like Git Lab.
- Experience contributing to dashboards and monitoring systems for production environments.
- Familiarity with automated remediation processes and system optimization practices.
- Background in supporting SaaS environments or cloud migrations.
Required Skills :
Dev Ops
Basic Qualification :
Additional Skills :
This is a high PRIORITY requisition. This is a PROACTIVE requisition
Background Check :
No
Drug Screen :
No
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×