Cloud Operations Engineer Job San Jose area,California USA,IT/Tech

Position: Staff Cloud Operations Engineer

Candidates for this role may be located in Portugal, Spain, Poland or Ireland

Job Teaser Summary

Extreme’s Cloud Operations team is a group of talented engineers passionate about building highly reliable, scalable and secure solutions in public/private cloud environments. We are looking to hire a highly motivated Cloud Operations engineer with strong working experience in production operation and deployment automation. You will work with the team to design, develop and implement deployment automation solutions end-to-end. You will also be expected to participate in continuous cloud service operation, troubleshoot and resolve complex issues in production.

We will work together to design, develop and implement the best public / private / local cloud solutions for our customers. Extreme Networks is the right place to be and now is the right time to join us and be part of our spectacular growth and success. We re looking for the best and the brightest A players who want to make a difference doing a job they love.

About

the Role

We want you to help lead infrastructure engineering for Extreme Cloud, a multi-cloud SaaS platform. Design, build, and operate large-scale, multi-region Kubernetes environments across AWS, GCP, and Azure and on-prem. Drive reliability, scalability, and operational excellence for a platform serving global customers.

What You ll Do

Architect & Scale Infrastructure
:
Design and implement multi-cluster, multi-region Kubernetes deployments using EKS, GKE, and AKS. Build infrastructure that scales across regions and cloud providers.
Own Production Systems
:
Take end-to-end ownership of production infrastructure. Drive incident response, postmortems, and improvements to prevent recurrence.
Infrastructure as Code at Scale
:
Build and maintain Terraform modules for complex infrastructure patterns. Manage thousands of configuration files across clusters, regions, and environments using Git Ops principles.
Git Ops & Deployment Excellence
:
Design and optimize ArgoCD Application Sets and Helm chart architectures. Build deployment pipelines that enable safe, automated releases across hundreds of microservices.
Performance & Reliability Engineering
:
Analyze system performance, identify bottlenecks, and implement optimizations. Improve SLOs through capacity planning, autoscaling, and architectural improvements.
Observability & Monitoring
:
Build and enhance monitoring, alerting, and observability using Prometheus, Grafana, Loki, and custom tooling. Drive visibility into complex distributed systems.
Security & Compliance
:
Implement security controls, compliance frameworks, and best practices across cloud infrastructure. Design secure multi-tenant architectures.
Technical Leadership
:
Mentor engineers, establish best practices, and drive technical decisions. Collaborate with platform, SRE, and product teams to deliver reliable infrastructure.

What We're Looking For

5+ years in cloud infrastructure engineering, with deep expertise in at least one major cloud provider (AWS preferred)
Strong Kubernetes experience: cluster design, operators, controllers, and multi-cluster management
Proficiency with Infrastructure as Code:
Terraform, Cloud Formation, or similar
Git Ops expertise:
ArgoCD, Flux, or similar; experience with Application Sets and complex deployment patterns
Deep Linux and networking knowledge
Experience with distributed systems:
Elasticsearch, Postgre

SQL, Redis, Kafka, RabbitMQ
Monitoring and observability:
Prometheus, Grafana, ELK stack, or similar
Strong problem-solving skills and experience debugging complex distributed systems
Experience with cloud security, compliance (SOC2, ISO
27001), and secure-by-design practices
Excellent communication skills for working across time zones and with distributed teams
Self-directed with a track record of owning problems end-to-end

Nice to Have

Experience with multi-cloud architectures and cloud-agnostic patterns
Contributions to open-source infrastructure projects
Experience with service mesh technologies (Istio, Linkerd)
Knowledge of chaos engineering and reliability testing
Experience with cost optimization and Fin Ops practices

Why This Role

Work on infrastructure at…


Increase/decrease your Search Radius (miles)



Job Posting Language