Site Lead, Data Center Operations, JoinOCI
Listed on 2026-05-18
-
IT/Tech
Data Engineering
Job Description
Independently responsible for one or more data centers, leading performance analyses across key operational areas and proactively monitoring facility health to implement significant enhancements. Drives process improvements by partnering across functions and regions, leads on‑ground teams in incident resolution, manages escalated technical issues, and utilizes advanced automation and monitoring tools to mitigate risks. Maintains an up‑to‑date knowledge base, executes incident management protocols, and conducts root cause analysis to improve operations.
Oversees new region builds and expansions, serves as the main liaison for expansion projects, and provides oversight for installations, repairs, inventory, and logistics—directing component upgrades and infrastructure changes to optimize data center efficiency and stability.
- Independent responsibility for at least one, occasionally multiple, Data Centers.
- Lead performance trend analyses related to capacity, temperature, availability, cleanliness, and other aspects; identify significant patterns and suggest operational improvements.
- Proactively monitor facility health at all times (power, cooling, security) and develop and implement major enhancements.
- Partner across functions and regions to identify, measure, and improve processes in alignment with industry best practices (Lean, Six Sigma), lead significant improvement projects, and ensure alignment with strategic objectives.
- Lead on‑ground resources to resolve incidents and perform accurate communication on execution.
- Oversee and provide support for escalated complex technical issues.
- Triage and/or escalates issues, and implement advanced automation, scheduling, and monitoring tools to mitigate potential problems effectively.
- Identify, document, and validate issues, processes, and solutions, ensuring the data center knowledge base is comprehensive and up‑to‑date.
- Prepare for and execute incident or crisis management protocols in alignment with business continuity plans.
- Perform Root Cause Analysis (RCA) following crises or incidents, updating documentation to capture process improvements.
- Lead and oversee new region builds and expansion activities, both onsite and remotely.
- Act as primary liaison with project teams and data center engineering, ensuring all timelines and capacity needs are strategically managed for expansion projects and site builds.
- Collaborate closely with project teams on critical aspects of expansion projects and site builds to deliver high standards.
- Provide oversight for installations, repairs, inventory management, and logistics tasks.
- Direct efforts to replace and upgrade components; advise on high‑level purchases or upgrades for data centers and oversee implementation.
- Lead planning and execution of rack deployments, installations, and network physical infrastructure upgrades/changes.
- Ensure proactive maintenance of the Data Center facility with regard to efficiency and stability (containment, airflow & pressure, power trains).
- Manage and coordinate moderately complex tasks, monitoring timelines and deliverables to ensure timely completion and adherence to requirements for a moderately‑sized project or initiative.
- Collaborate across the organization to align on expectations and achieve shared objectives.
- Identify and address moderately complex issues by analyzing data and/or information to identify solutions in accordance with standard practices; escalates unresolved or critical issues with thorough assessment and suggestions.
- Pursue learning opportunities to expand knowledge and skills and/or tools in new areas and stay abreast of the latest industry trends and best practices; coach and mentor junior team members.
- Develop ideas, recommend updates, and/or collaborate on the implementation of process improvements across teams, evaluating impact on key stakeholders.
- Contribute to the talent development pipeline by participating in candidate interviews, assessing candidates, and providing hiring recommendations.
Certifications, technical experience, and knowledge in data center operations, incident management, automation, performance…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).