Cloud Operations Engineer
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, Cloud Computing
Candidates for this role may be located in Portugal, Spain, Poland or Ireland
Job Teaser SummaryExtreme’s Cloud Operations team is a group of talented engineers passionate about building highly reliable, scalable and secure solutions in public/private cloud environments. We are looking to hire a highly motivated Cloud Operations engineer with strong working experience in production operation and deployment automation. You will work with the team to design, develop and implement deployment automation solutions end-to-end. You will also be expected to participate in continuous cloud service operation, troubleshoot and resolve complex issues in production.
We will work together to design, develop and implement the best public / private / local cloud solutions for our customers. Extreme Networks is the right place to be and now is the right time to join us and be part of our spectacular growth and success. We re looking for the best and the brightest A players who want to make a difference doing a job they love.
the Role
We want you to help lead infrastructure engineering for Extreme Cloud, a multi-cloud SaaS platform. Design, build, and operate large-scale, multi-region Kubernetes environments across AWS, GCP, and Azure and on-prem. Drive reliability, scalability, and operational excellence for a platform serving global customers.
What You ll Do- Architect & Scale Infrastructure
:
Design and implement multi-cluster, multi-region Kubernetes deployments using EKS, GKE, and AKS. Build infrastructure that scales across regions and cloud providers. - Own Production Systems
:
Take end-to-end ownership of production infrastructure. Drive incident response, postmortems, and improvements to prevent recurrence. - Infrastructure as Code at Scale
:
Build and maintain Terraform modules for complex infrastructure patterns. Manage thousands of configuration files across clusters, regions, and environments using Git Ops principles. - Git Ops & Deployment Excellence
:
Design and optimize ArgoCD Application Sets and Helm chart architectures. Build deployment pipelines that enable safe, automated releases across hundreds of microservices. - Performance & Reliability Engineering
:
Analyze system performance, identify bottlenecks, and implement optimizations. Improve SLOs through capacity planning, autoscaling, and architectural improvements. - Observability & Monitoring
:
Build and enhance monitoring, alerting, and observability using Prometheus, Grafana, Loki, and custom tooling. Drive visibility into complex distributed systems. - Security & Compliance
:
Implement security controls, compliance frameworks, and best practices across cloud infrastructure. Design secure multi-tenant architectures. - Technical Leadership
:
Mentor engineers, establish best practices, and drive technical decisions. Collaborate with platform, SRE, and product teams to deliver reliable infrastructure.
- 5+ years in cloud infrastructure engineering, with deep expertise in at least one major cloud provider (AWS preferred)
- Strong Kubernetes experience: cluster design, operators, controllers, and multi-cluster management
- Proficiency with Infrastructure as Code:
Terraform, Cloud Formation, or similar - Git Ops expertise:
ArgoCD, Flux, or similar; experience with Application Sets and complex deployment patterns - Deep Linux and networking knowledge
- Experience with distributed systems:
Elasticsearch, Postgre
SQL, Redis, Kafka, RabbitMQ - Monitoring and observability:
Prometheus, Grafana, ELK stack, or similar - Strong problem-solving skills and experience debugging complex distributed systems
- Experience with cloud security, compliance (SOC2, ISO
27001), and secure-by-design practices - Excellent communication skills for working across time zones and with distributed teams
- Self-directed with a track record of owning problems end-to-end
- Experience with multi-cloud architectures and cloud-agnostic patterns
- Contributions to open-source infrastructure projects
- Experience with service mesh technologies (Istio, Linkerd)
- Knowledge of chaos engineering and reliability testing
- Experience with cost optimization and Fin Ops practices
- Work on infrastructure at…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).