More jobs:
Operational Support Engineer
Job in
Atlanta, Fulton County, Georgia, 30301, USA
Listed on 2026-06-01
Listing for:
Dolby Sound Laboratories
Full Time
position Listed on 2026-06-01
Job specializations:
-
IT/Tech
Systems Engineer, IT Support, SRE/Site Reliability
Job Description & How to Apply Below
Join the leader in entertainment innovation and help us design the future. At Dolby, science meets art, and high tech means more than computer code. As a member of the Dolby team, you'll see and hear the results of your work everywhere, from movie theaters to smartphones. We continue to revolutionize how people create, deliver, and enjoy entertainment worldwide. To do that, we need the absolute best talent.
We're big enough to give you all the resources you need, and small enough so you can make a real difference and earn recognition for your work. We offer a collegial culture, challenging projects, and excellent compensation and benefits, not to mention a Flex Work approach that is truly flexible to support where, when, and how you do your best work.
The Dolby Cloud Solutions organization builds technologies and innovations that easily integrate into service providers' infrastructure to make content experiences more effective, meaningful, and engaging for consumers.
Dolby Opti View is building a dedicated Operational Support (L2) team responsible for the stability,
availability, and operational excellence of our 24/7 live video streaming, ads, player, and real-time
delivery platforms.
As an Operational Support Engineer (L2), you take end-to-end ownership of customer-impacting
production incidents once they are triaged by Level 1 support. You operate directly on production
systems, lead live incident resolution, and act as the operational bridge between Support, Engineering,
Dev Ops, and customers, particularly during high-impact live events.
This is a hands-on, customer-facing role focused on incident ownership, production operations,
automation, and operational scalability, not just reactive troubleshooting.
Key Responsibilities
Incident & Operational Support
* Take ownership of escalated customer issues from Level 1 Support and drive them to resolution.
* Troubleshoot and resolve complex, high-impact production incidents affecting live streams,
VOD playback, ad insertion, DRM, and real-time WebRTC services.
* Operate directly on production environments, including configuration changes, CDN
adjustments, and corrective actions, following established operational procedures, including
executing mitigations and emergency changes during live incidents when customer impact
requires immediate action.
* Lead or actively contribute to live incident bridges involving customers, internal teams, and
partners.
* Provide clear, timely communication during incidents, including status updates and customer
- facing explanations.
Infrastructure as Code & Production Operations
* Work fluently with Infrastructure as Code (IaC) to understand, troubleshoot, and safely modify
production environments
* Leverage tools and frameworks such as:
* Terraform
* Helm
* Kubernetes manifests
* Git Ops workflows
* CI/CD and deployment pipelines
* Use IaC as the primary mechanism for safe, auditable, and repeatable operational changes
* Collaborate with Engineering and Dev Ops to improve deployment reliability and operational
safety
* Validate and execute infrastructure or configuration changes through codified workflows
AI-Driven Operations & Automation.
* Leverage AI tools and automation to enhance operational efficiency and incident response.
Contribute to and use:
* AI-assisted incident triage and classification
* Automated runbook execution
* AI-based pattern detection across incidents
* Intelligent alert correlation and noise reduction
Use AI to:
* Generate or improve incident communications
* Accelerate troubleshooting workflows
* Identify recurring patterns and systemic issues
* Drive adoption of automation-first and AI-augmented operational practices
Pre-Event Planning & Operational Readiness
* Participate in pre-event readiness planning for critical customer events
Validate system readiness through:
* Runbook checks
* Monitoring coverage validation
* Risk identification and mitigation planning
* Define and rehearse incident response strategies for high-risk scenarios
* Collaborate with customers and internal teams to ensure smooth event execution
On-Call & 24/7 Operations
* Participate in a 24/7 on-call rotation, including nights, weekends, and holidays, as part of a
global support model
* Ensure smooth handovers between shifts and regions
* Respond to critical alerts within defined SLAs for stream health, player errors, and delivery
infrastructure.
Root Cause & Continuous Improvement
* Perform or contribute to root cause analysis (RCA) for production incidents
* Document findings, corrective actions, and preventive measures
* Identify recurring issues and work with Engineering and Product teams to eliminate them
permanently
* Contribute to and improve runbooks, operational playbooks, and knowledge bases for all
Opti View products (Player, ads, live and real time streaming)
Collaboration & Engineering Feedback Loop
* Work closely with Engineering teams to escalate defects, validate fixes, and support production
deployments
* Provide feedback on system observability, tooling gaps, and operational…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×