Sr Operational Support Engineer Job Atlanta area,Georgia USA,IT/Tech

Position: Sr Staff Operational Support Engineer

Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you’ll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent.

We’re big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.

The Dolby Cloud Solutions organization builds technologies and innovations that easily integrate into service providers’ infrastructure to make content experiences more effective, meaningful, and engaging for consumers.

Dolby Opti View is building a world-class Operational Support organization responsible for the stability, availability, and operational maturity of our 24/7 live video streaming, ads, player, and real‑time delivery platforms. As a Senior Operational Support Engineer, you provide technical and operational leadership for the most complex, high‑impact production scenarios. You act as the escalation point beyond L2, lead critical incident response for Tier‑1 customers and marquee live events, and drive systemic improvements across reliability, automation, and operational readiness.

This role goes beyond incident resolution. You shape how we operate at scale, influence platform design through operational feedback, and partner deeply with Engineering, Dev Ops, Product, and Support leadership to continuously raise the reliability bar for Opti View.

Key Responsibilities Incident Leadership & Escalations

Serve as the final operational escalation point for severe, complex, or prolonged customer‑impacting incidents
Lead resolution of multi‑system, multi‑team incidents spanning streaming pipelines, player platforms, ad insertion, DRM, CDN, and real‑time services
Own incident command during major live events, including decision‑making under pressure and risk‑based trade‑offs
Drive high‑quality, executive‑ and customer‑facing incident communications during critical situations
Coach and support L2 engineers during live incidents, providing guidance and oversight without taking ownership away unnecessarily

Advanced Production Operations & IaC

Operate confidently and independently on production environments with broad system‑level awareness
Design, review, and approve complex production changes using Infrastructure as Code as the default mechanism
Deep expertise across:
- Terraform
- Helm & Kubernetes manifests
- Git Ops workflows
- CI/CD and deployment pipelines

Partner with Engineering and Dev Ops to:

Improve deployment safety and rollback strategies
Define operational guardrails and blast‑radius controls
Influence platform architecture with operability and resilience in mind

AI-Driven Operations, Automation & Tooling

Lead adoption of AI‑augmented operations across the support organization
Define and evolve:
- AI‑assisted incident triage and prioritization
- Automated and semi‑automated runbooks
- Intelligent alert correlation and noise reduction
Use AI and automation to:
- Reduce mean time to detect (MTTD) and resolve (MTTR)
- Identify systemic patterns across incidents and customers
- Improve the quality and consistency of incident communications
- Champion an automation‑first mindset, identifying opportunities where manual operational work should be eliminated entirely

Operational Readiness & Live Event Excellence

Own operational readiness for high‑risk, high‑visibility customer events
Lead pre‑event planning and validation, including:
- Architecture and risk reviews
- Runbook and escalation path validation
- Monitoring, alerting, and SLO coverage assessment
- Design and rehearse incident response strategies for worst‑case scenarios
Act as a trusted operational advisor to strategic customers before, during, and after major events

On‑Call & 24/7 Operations

Participate in a 24/7 on‑call rotation, including…