×
Register Here to Apply for Jobs or Post Jobs. X

Software Engineering Director , Production Support Operations

Job in Atlanta, Fulton County, Georgia, 30383, USA
Listing for: Cooper Lighting Solutions
Full Time position
Listed on 2026-05-24
Job specializations:
  • IT/Tech
    IT Support, Systems Engineer, IT Project Manager, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 120000 - 160000 USD Yearly USD 120000.00 160000.00 YEAR
Job Description & How to Apply Below
Position: SOFTWARE ENGINEERING DIRECTOR I, Production Support Operations

Director of Production Support

The Director of Production Support leads teams responsible for ensuring the stability, resilience, and operational excellence of critical technology platforms supporting core lines of business. This role owns end‑to‑end production support operations while driving maturity toward engineering‑first, site reliability–focused practices. The Director identifies and resolves complex technical, operational, risk, and organizational challenges, while building high‑performing, accountable teams across onshore and offshore locations.

This position carries full people management responsibility, including hiring, coaching, performance management, and disciplinary actions, and serves as a key partner to Technology, Risk, and Business leadership.

ESSENTIAL DUTIES AND RESPONSIBILITIES Production Support Leadership & Accountability

Own end‑to‑end production support operations for multiple mission‑critical applications supporting key lines of business, ensuring availability, stability, and performance meet defined SLAs and SLOs. Provide accountable, visible leadership for 24x7 operational support, including on‑call models, escalation paths, and incident response effectiveness. Act as the senior escalation point for major incidents, ensuring swift recovery, accurate root cause analysis, and durable remediation.

Incident & Problem Management

Lead cross‑functional incident recovery efforts in partnership with Incident Management, engineering teams, infrastructure, and business stakeholders. Ensure timely root cause analysis (RCA), post‑incident reviews, and corrective actions that prevent recurrence. Establish and mature a production knowledge base, documenting known issues, recovery procedures, and architectural insights.

Engineering‑First & SRE Practices

Drive adoption of Site Reliability Engineering (SRE) and lean engineering principles, including:

  • Reduction of toil through automation
  • Engineering‑based reliability metrics (error budgets, SLIs/SLOs)
  • Proactive resilience and failure prevention practices

Champion automation of repetitive and manual operational tasks, including incident detection, response, validation, and recovery where feasible. Promote a culture of preventative engineering, partnering with development teams to improve system reliability upstream.

Monitoring, Observability & AI Enablement

Implement and continuously improve real‑time monitoring, alerting, and observability across applications and infrastructure. Measure and optimize the effectiveness of monitoring and alerting to eliminate noise and accelerate mean‑time‑to‑detect and mean‑time‑to‑recover. Leverage AI and advanced analytics to correlate telemetry data (logs, metrics, traces) and proactively identify emerging risks and root causes. Champion the safe and responsible use of AI within production operations by adhering to enterprise guardrails and protecting sensitive data and system integrity.

Operational

Readiness & Change Enablement

Oversee operational readiness across releases, disaster recovery and failover testing and certificate and dependency lifecycle management. Ensure production support is actively embedded in change planning, minimizing risk from releases and infrastructure changes.

People, Vendor & Financial Management

Lead one or more Agile teams (Scrum, Kanban), including onshore and offshore engineers, fostering high performance and accountability. Manage workforce vendors and partners, setting expectations, reviewing performance, and ensuring delivery quality. Own budget and staffing plan aligned to application criticality, operational risk, and business growth objectives.

Risk Management & Governance

Act as the first line of defense in production operations by proactively identifying and mitigating technology, operational, and resiliency risks. Partner effectively with second‑line Risk, Audit, and Regulatory teams, ensuring findings are addressed and controls are continuously improved. Ensure compliance with internal policies, regulatory requirements, and external audit expectations. Own and drive remediation plans for risk, audit, and regulatory findings, ensuring timely, effective and sustainable…

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary