Operational Support Engineer; L2 Job Atlanta area,Georgia USA,IT/Tech

Position: Staff Operational Support Engineer (L2)

Overview

Job Title: Staff Operational Support Engineer (L2)

Position

Description:

Protingent Staffing has an exciting contract Staff Operational Support Engineer (L2) with our client located in Atlanta, GA.

Job Description: As an Operational Support Engineer (L2), you take end to end ownership of customer impacting production incidents once they are triaged by Level 1 support. You operate directly on production systems, lead live incident resolution, and act as the operational bridge between Support, Engineering, Dev Ops, and customers, particularly during high impact live events. This is a hands on, customer facing role focused on incident ownership, production operations, automation, and operational scalability, not just reactive troubleshooting.

Job Responsibilities

Incident & Operational Support:
- Take ownership of escalated customer issues from Level 1 Support and drive them to resolution
- Troubleshoot and resolve complex, high-impact production incidents affecting live streams, VOD playback, ad insertion, DRM, and real-time WebRTC services
- Operate directly on production environments, including configuration changes, CDN adjustments, and corrective actions, following established operational procedures, including executing mitigations and emergency changes during live incidents when customer impact requires immediate action
- Lead or actively contribute to live incident bridges involving customers, internal teams, and partners
- Provide clear, timely communication during incidents, including status updates and customer-facing explanations

Infrastructure as Code & Production Operations:
- Work fluently with Infrastructure as Code (IaC) to understand, troubleshoot, and safely modify production environments
- Leverage tools and frameworks such as Terraform, Helm, Kubernetes manifests, Git Ops workflows, and CI/CD and deployment pipelines
- Use IaC as the primary mechanism for safe, auditable, and repeatable operational changes
- Collaborate with Engineering and Dev Ops to improve deployment reliability and operational safety
- Validate and execute infrastructure or configuration changes through codified workflows

AI-Driven Operations & Automation:
- Leverage AI tools and automation to enhance operational efficiency and incident response
- Contribute to and use AI-assisted incident triage and classification, automated runbook execution, AI-based pattern detection across incidents, and intelligent alert correlation and noise reduction
- Use AI to generate or improve incident communications, accelerate troubleshooting workflows, and identify recurring patterns and systemic issues
- Drive adoption of automation-first and AI-augmented operational practices

Pre-Event Planning & Operational Readiness:
- Participate in pre-event readiness planning for critical customer events
- Validate system readiness through runbook checks, monitoring coverage validation, and risk identification and mitigation planning
- Define and rehearse incident response strategies for high-risk scenarios
- Collaborate with customers and internal teams to ensure smooth event execution

On-Call & 24/7 Operations:
- Participate in a 24/7 on-call rotation, including nights, weekends, and holidays, as part of a global support model
- Ensure smooth handovers between shifts and regions
- Respond to critical alerts within defined SLAs for stream health, player errors, and delivery infrastructure

Root Cause & Continuous Improvement:
- Perform or contribute to root cause analysis (RCA) for production incidents
- Document findings, corrective actions, and preventive measures
- Identify recurring issues and work with Engineering and Product teams to eliminate them permanently
- Contribute to and improve runbooks, operational playbooks, and knowledge bases for all Opti View products (Player, ads, live and real time streaming)

Collaboration & Engineering Feedback Loop:
- Work closely with Engineering teams to escalate defects, validate fixes, and support production deployments
- Provide feedback on system observability, tooling gaps, and operational risks
- Act as the operational voice during post-incident reviews.

Job Qualifications

5+ years of relevant experience in operational, support, or similar customer-facing roles
Proven…