Senior SRE Engineer: Cloud Infra, Observability & Automation
Listed on 2026-05-29
-
IT/Tech
SRE/Site Reliability, Cloud Computing
Site Reliability Engineer - Senior Staff
Req
Location:
Sunnyvale, California, United States, 94089
In our ‘always on’ world, we believe it’s essential to have a genuine connection with the work you do.
At Ruckus Networks, you will work on large-scale cloud networking platforms that support enterprise customers globally. You will help improve reliability, automation, observability, and customer experience while working with modern cloud and SRE technologies in a collaborative engineering environment.
How You’ll help us connect the world:
Ruckus Networks is looking for a customer focused Senior Site Reliability Engineer (SRE) to help improve reliability, scalability, operational excellence, and customer experience across our cloud platform ecosystem.
This role is ideal for engineers who enjoy solving production problems, building automation, and improving platform reliability will work on distributed systems powering cloud networking services used by customers globally in fast paced environment.
As part of the SRE organization, you will work closely with engineering, cloud operations, and support teams to improve platform stability, observability, automation, and operational readiness.
THIS IS A HYBRID ROLE AND NEEDS TO BE ON-SITE AT OUR SUNNYVALE, CA OFFICE 3 DAYS A WEEK. NO RELOCATION OR 3RD PARTY AGENCIES PLEASE
Key Responsibilities:
Reliability Engineering & Operations
Operate and improve highly available, scalable cloud services and infrastructure
Troubleshoot production issues across applications, infrastructure, networking, databases, and cloud services
Improve observability through metrics, logging, tracing, synthetic monitoring, and alerting
Help define and improve SLIs, SLOs, and operational health metrics
Participate in incident response and support Sev-1/customer-impacting events
Contribute to post-incident reviews and long-term reliability improvements
Improve operational processes, automation, and deployment safety
Automation & Engineering
Build operational tooling and automation using Python
Improve operational efficiency through automation and self-service tooling
Support CI/CD improvements and deployment validation workflows
Develop health checks, monitoring integrations, and operational diagnostics
Cloud & Infrastructure
Support services running in Google Cloud Platform (GCP)
Work with Kubernetes, containers, and cloud-native platforms
Analyze scalability, performance, and resource utilization
Collaborate with software engineering teams on operational readiness and reliability improvements
Observability & Monitoring
Build dashboards, alerts, and telemetry pipelines
Work with observability platforms such as Prometheus, Grafana, Open Telemetry, and ELK
Support monitoring and analytics platforms including Click House
Improve signal quality and reduce operational alert noise
Develop synthetic monitoring focused on customer workflows
Collaboration
Partner with Engineering, Product Management, Customer Support, and Cloud Operations teams
Participate in architecture and operational readiness discussions
Mentor junior engineers and contribute to SRE best practices
Promote operational excellence, ownership, and customer focus
Required Qualifications:
5+ years of experience in Site Reliability Engineering, Dev Ops, Cloud Infrastructure, or Production Engineering
Strong programming skills in Python
Experience with Linux systems administration and troubleshooting
Hands-on experience with Google Cloud Platform (GCP)
Experience with Kubernetes, containers, and cloud-native infrastructure
Experience troubleshooting distributed systems in production environments
Experience with observability tools such as Prometheus, Grafana, Open Telemetry, or ELK
Familiarity with Click House or large-scale telemetry platforms
Understanding of networking fundamentals, APIs, databases, and cloud architectures
Experience participating in production incident response and operational support
You Excite us if you have:
Experience supporting SaaS or cloud platforms at scale
Familiarity with Kafka or event-driven architectures
Experience building automation and monitoring solutions
Familiarity with wireless networking or enterprise networking platforms
Experience…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).