Director, Data Center Facility Operations - Saline Township, MI
Listed on 2026-05-01
-
Management
-
IT/Tech
Director, Data Center Facility Operations
MI, United States
United States
- Job Identification 330461
- Job Category Technology Operations
- Posting Date 03/27/2026, 07:07 PM
- Role People Manager
- Job Type Regular Employee
- Does this position require a security clearance? No
- Years 10+ years
- Additional Info Visa / work permit sponsorship is not available for this position
- Applicants are required to read, write, and speak the following languages English
Leads enterprise-wide performance monitoring and real-time operational governance, ensuring standardized processes for shift operations, event management, escalation, incident command, and communications. Oversees capacity and readiness for critical infrastructure (power, cooling, controls, life safety, and physical security), ensuring sites are resilient, compliant, and audit-ready.
Partners with executive leadership on multi-year operational, reliability, and financial targets; drives adoption of automation, telemetry, and predictive maintenance to reduce risk and improve mean time to restore (MTTR). Establishes crisis management standards, continuous improvement mechanisms, and a culture of operational excellence, knowledge sharing, and accountability.
Leads major expansion and transformation initiatives impacting operational readiness, serves as senior liaison across regions, and oversees the full lifecycle of critical infrastructure and hardware assets—including install, maintenance strategy, spares, vendor performance, and investment governance—to optimize reliability, security, and scalability.
ResponsibilitiesKey Responsibilities 24/7 Mission Critical Operations Leadership
Owns 100% uptime operations for a portfolio of very large/complex data center sites, ensuring consistent execution of shift coverage, operational handoffs, and standardized runbooks.
Establishes and governs the Mission Critical Operations (MCO) operating model: command structure, on-call rotations, escalation paths, and service-impacting event response.
Ensures operational readiness for high-severity incidents through drills/tabletops, incident commander training, and continuous improvement of response playbooks.
Performance Monitoring, Controls, and ReliabilityDefines the enterprise strategy for real-time monitoring and operational health across the portfolio (BMS/EPMS/SCADA/telemetry), aligning KPIs to uptime, reliability, safety, and customer outcomes.
Drives operating rhythms for reviewing: availability, MTTR/MTBF, alarm quality, repeat events, maintenance effectiveness, and risk posture.
Establishes standards for preventive and predictive maintenance, MOP/SOP/EOP quality, change control, and operational compliance.
Incident, Problem, and Crisis ManagementGoverns standards for event triage, incident command, escalation, stakeholder communications, and customer-impacting notifications.
Leads post-incident reviews for P1/P0 events, ensuring root cause analysis (RCA) quality, corrective/preventive actions (CAPA), and verified closure.
Operates as executive escalation point for highly complex incidents and cross-regional reliability risks.
Capacity, Resiliency, and Site ReadinessOversees evaluation of power, cooling, physical space, network/support infrastructure, and security capacity, ensuring readiness for load growth and peak conditions.
Ensures resiliency standards are met (redundancy, maintenance windows, failover testing, generator/UPS readiness, fuel strategy as applicable).
Directs operational risk assessments and ensures sites remain audit-ready and compliant with applicable standards and internal controls.
Automation and Operational ToolingDrives adoption of automation for alarm correlation, workflow orchestration, remote operations, and predictive analytics to reduce human error and improve response times.
Standardizes data quality and instrumentation required for high-confidence operational decision-making.
Expansion, Launch, and Transformation (Operational Readiness Focus)Leads operational support for expansions/new builds/site launches, ensuring Day-0/Day-1 readiness, staffing, training, spares, procedures, and turnover acceptance criteria.
Partners with engineering and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).