Sr Operational Support Engineer Job Atlanta area,Georgia USA,IT/Tech

Position: Sr Staff Operational Support Engineer

The Senior Operational Support Engineer at Dolby Opti View is responsible for providing technical and operational leadership for high-impact production scenarios in live video streaming, advertising, player, and real-time delivery platforms.

Key Responsibilities

Incident Leadership & Escalations
- Serve as the final operational escalation point for severe, complex, or prolonged customer-impacting incidents.
- Lead resolution of multi-system, multi-team incidents spanning streaming pipelines, player platforms, ad insertion, DRM, CDN, and real-time services.
- Own incident command during major live events, including decision-making under pressure and risk-based trade-offs.
- Drive high-quality, executive- and customer-facing incident communications during critical situations.
- Coach and support L2 engineers during live incidents, providing guidance and oversight without taking ownership away unnecessarily.
Advanced Production Operations & IaC
- Operate confidently and independently on production environments with broad system-level awareness.
- Design, review, and approve complex production changes using Infrastructure as Code as the default mechanism.
- Deep expertise across Terraform, Helm & Kubernetes manifests, Git Ops workflows, CI/CD and deployment pipelines.
Partner with Engineering and Dev Ops
- Improve deployment safety and rollback strategies.
- Define operational guardrails and blast-radius controls.
- Influence platform architecture with operability and resilience in mind.
AI-Driven Operations, Automation & Tooling
- Lead adoption of AI-augmented operations across the support organization.
- Define and evolve AI-assisted incident triage and prioritization, automated and semi-automated runbooks, intelligent alert correlation, and noise reduction.
- Use AI and automation to reduce mean time to detect (MTTD) and resolve (MTTR), identify systemic patterns across incidents and customers, and improve the quality and consistency of incident communications.
- Champion an automation-first mindset, identifying opportunities where manual operational work should be eliminated entirely.
Operational Readiness & Live Event Excellence
- Own operational readiness for high-risk, high-visibility customer events.
- Lead pre-event planning and validation, including architecture and risk reviews, runbook and escalation path validation, monitoring, alerting, and SLO coverage assessment.
- Design and rehearse incident response strategies for worst‑case scenarios.
- Act as a trusted operational advisor to strategic customers before, during, and after major events.
On-Call & 24/7 Operations
- Participate in a 24/7 on-call rotation, including nights, weekends, and holidays, as part of a global support model.
- Ensure smooth handovers between shifts and regions.
- Respond to critical alerts within defined SLAs for stream health, player errors, and delivery infrastructure.
Root Cause & Continuous Improvement
- Perform or contribute to root cause analysis (RCA) for production incidents.
- Document findings, corrective actions, and preventive measures.
- Identify recurring issues and work with Engineering and Product teams to eliminate them permanently.
- Contribute to and improve runbooks, operational playbooks, and knowledge bases for all Opti View products.
Collaboration & Engineering Feedback Loop
- Work closely with Engineering teams to expedite defect resolution, validate fixes, and support production deployments.
- Provide feedback on system observability, tooling gaps, and operational risks.
- Act as the operational voice during post-incident reviews.

Required Skills & Experience

Technical Skills
- 8+ years of relevant experience in operational, support, or similar customer‑facing roles.
- Proven ability to own complex problems end‑to‑end and operate with a high degree of autonomy.
- Experience influencing decisions and outcomes beyond individual contribution.
- Deep experience operating and supporting large-scale, production video streaming platforms.
- Solid troubleshooting skills across distributed systems (APIs, microservices, cloud infrastructure).
- Expert understanding of HLS, DASH, CMAF, WebRTC, DRM and CDN architectures.
- Advanced experience working with monitoring, alerting, and logs to diagnose live…