Site Reliability Engineer - Senior Job Sunnyvale area,California USA,IT/Tech

Position: Site Reliability Engineer - Senior Staff

Overview

Site Reliability Engineer - Senior Staff

Req

Location:

Sunnyvale, California, United States, 94089

In our ‘always on’ world, we believe it’s essential to have a genuine connection with the work you do.

At Ruckus Networks, you will work on large-scale cloud networking platforms that support enterprise customers globally. You will help improve reliability, automation, observability, and customer experience while working with modern cloud and SRE technologies in a collaborative engineering environment.

How You’ll help us connect the world:

Ruckus Networks is looking for a customer focused Senior Site Reliability Engineer (SRE) to help improve reliability, scalability, operational excellence, and customer experience across our cloud platform ecosystem.

This role is ideal for engineers who enjoy solving production problems, building automation, and improving platform reliability will work on distributed systems powering cloud networking services used by customers globally in fast paced environment.

As part of the SRE organization, you will work closely with engineering, cloud operations, and support teams to improve platform stability, observability, automation, and operational readiness.

THIS IS A HYBRID ROLE AND NEEDS TO BE ON-SITE AT OUR SUNNYVALE, CA OFFICE 3 DAYS A WEEK. NO RELOCATION OR 3RD PARTY AGENCIES PLEASE

Key Responsibilities

Operate and improve highly available, scalable cloud services and infrastructure
Troubleshoot production issues across applications, infrastructure, networking, databases, and cloud services
Improve observability through metrics, logging, tracing, synthetic monitoring, and alerting
Help define and improve SLIs, SLOs, and operational health metrics
Participate in incident response and support Sev-1/customer-impacting events
Contribute to post-incident reviews and long-term reliability improvements
Improve operational processes, automation, and deployment safety

Automation & Engineering

Build operational tooling and automation using Python
Improve operational efficiency through automation and self-service tooling
Support CI/CD improvements and deployment validation workflows
Develop health checks, monitoring integrations, and operational diagnostics

Cloud & Infrastructure

Support services running in Google Cloud Platform (GCP)
Work with Kubernetes, containers, and cloud-native platforms
Analyze scalability, performance, and resource utilization
Collaborate with software engineering teams on operational readiness and reliability improvements

Observability & Monitoring

Build dashboards, alerts, and telemetry pipelines
Work with observability platforms such as Prometheus, Grafana, Open Telemetry, and ELK
Support monitoring and analytics platforms including Click House
Improve signal quality and reduce operational alert noise
Develop synthetic monitoring focused on customer workflows

Collaboration

Partner with Engineering, Product Management, Customer Support, and Cloud Operations teams
Participate in architecture and operational readiness discussions
Mentor junior engineers and contribute to SRE best practices
Promote operational excellence, ownership, and customer focus

Required Qualifications

5+ years of experience in Site Reliability Engineering, Dev Ops, Cloud Infrastructure, or Production Engineering
Strong programming skills in Python
Experience with Linux systems administration and troubleshooting
Hands-on experience with Google Cloud Platform (GCP)
Experience with Kubernetes, containers, and cloud-native infrastructure
Experience troubleshooting distributed systems in production environments
Experience with observability tools such as Prometheus, Grafana, Open Telemetry, or ELK
Familiarity with Click House or large-scale telemetry platforms
Understanding of networking fundamentals, APIs, databases, and cloud architectures
Experience participating in production incident response and operational support

You Excite us if you have

Experience supporting SaaS or cloud platforms at scale
Familiarity with Kafka or event-driven architectures
Experience building automation and monitoring solutions
Familiarity with wireless networking or enterprise networking platforms
Experience improving operational…