DevOps Engineer III
Listed on 2026-02-16
-
IT/Tech
Cloud Computing, Systems Engineer
Everseen: A leader in vision AI solutions for the world’s leading retailers. The Role
As a Dev Ops Engineer III, you will be part of the L3 support team for Operations across Edge/on‑prem and cloud, owning complex incidents end‑to‑end: triage, deep‑dive debugging, root‑cause analysis, remediation, and follow‑ups.
Having a good understanding of our product, its components and their interactions is essential in troubleshooting and problems remediation. Strong Linux administration (RHEL primarily, plus Ubuntu) and Open Shift/Kubernetes expertise are essential.
To reduce Operations(Customer Deployment) issues you will build targeted automations (Python, Bash, Ansible) and automate new and existing SOPs used by Operations.
You will execute safe cloud deployments and upgrades via Git Ops and IaC pipelines (Flux, Ansible, Terraform) on AKS and GKE—coordinating validation and rollback plans—and contribute to the maintenance of existing Git Lab CI/CD pipelines together with the Dev Ops engineering teams.
You will design and continuously refine Alert manager rules and standardize actionable Grafana dashboards with Operations, ensuring effective use of Prometheus metrics and logs (Grafana Alloy, Thanos).
Beyond day‑to‑day operations, you’ll apply deep Dev Ops, CI/CD, and infrastructure automation expertise, drive best practices, share knowledge through workshops and mentoring, write and maintain documentation and SOPs (Standard Operating Procedure), test infrastructure, and collaborate across teams to optimize systems and workflows.
What you’ll do- Designs and maintains CI/CD pipelines using Git Lab CI/CD.
- Implements Infrastructure as Code (IaC) with tools like Terraform.
- Oversees advanced CI/CD pipeline setups, including Git Ops with Flux CD.
- Automates complex workflows and enhances infrastructure scalability.
- Troubleshoots and optimizes Kubernetes cluster operations.
- Integrates monitoring solutions for observability.
- Writes and maintains system operations documentation (articles, diagrams, data flows, etc.) for new and existing applications and services.
- Keeps up‑to‑date on best practices and new technologies.
- Conducts, designs, and executes staging/UAT/production and mass service deployment scenarios.
- Collaborates on technical architecture and system design.
- Analyzes and collects data: log files, application stack traces, thread dumps etc.
- Reproduces and simulates application incidents to create debug reports and coordinate delivery of application fixes.
- Evaluates existing components or systems to determine integration requirements and to ensure the final solutions meet organizational needs.
- Interacts with cross‑functional management on high profile technical operations while providing clear feedback and leadership to support teams.
- Authoring knowledgebase articles and driving internal knowledge sharing.
- Work in off‑routine hours occasionally.
- Work with customers and travel to international customer or partner locations high‑profile.
- Operations (Customer Deployment) teams:
Collaborate with the Operations teams for troubleshooting and solving L3 tickets, create automations to reduce and optimize workload. - Dev Ops Cloud and Edge teams:
Work closely with the wider Dev Ops engineering teams, your manager, developers and QA engineers to understand requirements, provide technical guidance, and ensure smooth integration and deployment of our product. - Security Team:
Collaborate with the team to ensure the security of our cloud and edge solutions.
- CI/CD Tools:
Git Lab CI/CD - Cloud Platforms:
Azure (AKS, Registry), GCP (GKE) - Edge Platforms:
Docker, Podman, Kubernetes(k0s) and Openshift - Edge OS: RHEL, Ubuntu
- Automation Tools:
Ansible (AWX), Jinja, Terraform - Deployment Tools:
Helm, Flux CD - Observability:
Prometheus, Loki, Grafana alloy, Grafana dashboards, Thanos - Databases:
Elasticsearch, MongoDB - Authentication:
Keycloak - Scripting
Languages:
Python, Bash
- Experience:
4+ years in Dev Ops-related roles with a strong focus on automation. - Networking:
Proficient in DNS, routing, container communication, firewalls, reverse‑proxying, load‑balancing, edge to cloud communication and troubleshooting. - System…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).