Data Center Operations Engineer
Listed on 2026-02-10
-
IT/Tech
Systems Engineer
Description
Seeking an experienced Data Center Operations Engineer to ensure environments run with precision, efficiency, and uptime across global sites. This role will bridge IT and facilities, maintaining the power, cooling, and compute systems that sustain the company’s world-class AI platforms. This role requires a technically strong, detail-oriented engineer who thrives in high-availability environments. Must understand the full stack of data center infrastructure:
Compute, Network, Power, and Cooling, and must take pride in systems that run flawlessly because of your work. Be able to communicate clearly, perform methodically under pressure, and collaborate effectively across IT, facilities, and vendor teams. This role requires a builder, a problem-solver, and a guardian of uptime, someone who values precision, safety, and accountability in every aspect of operations.
- Own the day-to-day reliability and performance of company data centers, supporting both IT and facility infrastructure. This includes installing and configuring servers and compute equipment, managing structured cabling, and performing Layer 1–3 troubleshooting across compute and network layers.
- Partner closely with colocation and data center providers to maintain uptime reviewing maintenance procedures, coordinating planned work, validating redundancy during transitions, and verifying site health after power or cooling events.
- Work alongside facilities teams, you’ll help operate and maintain critical power and cooling systems, including transformers, PDUs, UPS, switch gear, generators, CRAC and CRAH units, CDUs, chillers, cooling towers, and containment systems. You’ll assist in capacity planning, preventive maintenance, and load balancing across power and cooling zones to maintain safe, efficient, and redundant operations.
- Lead incident response and root‑cause analysis, refine standard operating procedures, and implement automation to improve efficiency and consistency across company data centers worldwide.
- 10+ years in data center compute operations, facilities, or infrastructure engineering and/or a degree in an Engineering or Computer Science discipline
- Hands‑on experience with servers, networking, and structured cabling
- Working knowledge of electrical systems including transformers, PDUs, UPS, switch gear, and generators
- Understanding of cooling systems including CRAC/CRAH units, CDUs, cooling towers, chillers, and containment environments
- Familiarity with Linux and basic scripting (Bash, Python, Ansible)
- Proficiency with network CLIs (Cisco, Arista, Juniper)
- Experience collaborating with colocation providers and reviewing MOP/EOPs for electrical and mechanical work
- Proficiency with ITSM/DCIM platforms (e.g., Jira, Service Now, Net Box, Sunbird)
- Ability to manage server, switch, router, storage, and hardware lifecycle processes
- Ability to update asset management systems using scanners and inventory tools
- Strong documentation, troubleshooting, and communication skills for ticketing, customer communication, and team coordination
- Strong multitasking, adaptability, and time‑management skills with a focus on quality and throughput
- Must be punctual, reliable, and well‑organized
- Must have strong interpersonal and teamwork skills, with the ability to work independently when needed
- Willingness to support on‑call rotation and meet a 60‑minute on‑site SLA
- Ability to safely lift 50–75 lbs and remain on feet for majority of the workday
- Ability to operate material‑handling equipment (pallet jacks, forklifts, server‑lift)
- Demonstrated ability to learn new systems, methodologies, software, and hardware platforms
- Experience working in high‑tempo, high‑stress environments
- Experience leading and/or mentoring more junior staffers
- Domain expert in one or more of the following functional areas:
- Datacenter power systems
- Datacenter cooling/HVAC systems
- Server or liquid cooling
- Network routing and switching
- Late generation flash storage arrays
- Facility and/or network security
- Network infrastructure monitoring
- Ability to project manage key datacenter‑centric initiatives
- Able to effectively present data to senior leadership
- Familiar with datacenter key performance indicators (KPI)
- Ability to manage outage events
- Be a good person and good team mate
- CompTIA Server+, Network+, or Linux+
- ITIL Foundation certification
- Networking: CCNA, JNCIA, ACE‑A
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).