Principal Kubernetes DevOps Engineer Job San Francisco area,California USA,IT/Tech

Requirements

We are seeking a Principal Kubernetes Dev Ops Engineer who combines deep technical expertise with broad system understanding
This engineer should be capable of diving into a wide range of services and identifyingsystemic issues across architecture, CI/CD flow, and containerization environments
This role requires technical leadership, analytical skill, and cross-team collaboration to drive reliability, scalability, and modernization
15+ years in Dev Ops, SRE for large-scale, production systems. successful hands-on background in Linux systems, networking, and distributed systems
Possess experience operating and design low-latency, high-throughput backend services at global scale. Knowledge of media or real-time communication systems (e.g., MMR, WebRTC)
Recognize knowledge of TCP/IP, routing, DNS, load balancing, and packet capture tools. Familiarity with colocation data center operations, including hardware provisioning and troubleshooting
Demonstrate experience with Terraform, Ansible, Kubernetes, Docker, and modern CI/CD pipelines. successful problem-solving, debugging, and systems-level design skills
Occasional weekend work may be required
Ability to work across the globe or multiple time zones

What the job involves

At Zoom, we’re building the next generation of Cloud and Colocation (Colo) infrastructure that powers seamless communication and collaboration for millions of users worldwide
Leading deep-dive investigations across diverse services and environments. Working on real time media systems to web, team chat and AI to uncover architectural or operational bottlenecks
Designing and implementing improvements in deployment pipelines, orchestration frameworks, andCI/CD automation to increase reliability and release velocity
Working closely with product and service owners to enhance containerization strategy, improve resource efficiency, and reduce operational friction
Partnering with the Meeting Dev Ops and Cloud Infra teams to modernize hybrid infrastructures panning colocation data centers, AWS, OCI, and other cloud providers
Driving system observability, fault isolation, and resilience engineering, ensuring services meet strict availability and latency SLAs
Providing technical mentorship to Dev Ops engineers and influence best practices in automation, monitoring, and release engineering. Champion a culture of data-driven reliability through postmortems, SLIs/ SLO's, and continuous performance optimization

#J-18808-Ljbffr