Data Center Reliability Engineer
Listed on 2026-02-19
-
Engineering
Systems Engineer, Electrical Engineering -
IT/Tech
Systems Engineer, Electrical Engineering
As a Reliability Engineer, you will apply data-driven analysis and engineering problem-solving to improve availability and reduce risk across mission-critical facility systems. You will identify failure patterns early, drive corrective actions, and build tooling and metrics that improve reliability s role manages ongoing critical environment maintenance by completing standard diagnostics and repairs and resolving issues. Manages incidents impacting services and conducts root cause analysis to mitigate recurrence and improve system resilience.
Conducts data center build site reviews and assessments in collaboration with other teams to evaluate suitability for data center builds. Supports and validates on-site data centers operations in relation to the electrical or mechanical infrastructure. Coordinates with internal and external project team members in delivering specific aspects of data centers or part-data centers for Oracle.
- Monitor and analyze operational telemetry, alarms, and performance trends to identify emerging risks and reliability degradation.
- Define and track reliability KPIs; deliver concise analysis and recommendations that drive operational and engineering decisions.
- Develop and maintain analytics and reporting tools using Python, SQL, and/or DCIM/BMS/SCADA data sources.
- Support and/or lead RCAs and corrective action tracking for recurring or high-impact issues, ensuring follow-through and verification.
- Partner with operations and engineering teams to improve preventive strategies, automation opportunities, and compliance execution.
- Contribute to reliability standards and documentation that improve repeatability across sites.
- Experience in reliability or systems analysis in data centers or other uptime-critical environments (utilities, telecom, manufacturing).
- Engineering degree or equivalent applied experience; comfort with data and tooling is required for this to be real.
- Strong analytical and visualization skills; disciplined technical documentation.
- Able to influence outcomes through evidence, clarity, and structured thinking.
- Global impact at scale:
Contribute directly to how mission-critical OCI data centers operate across regions and continents, influencing infrastructure reliability, security, sustainability, and long-term capacity growth. - Technically rigorous environment:
Work alongside experienced engineers, automation specialists, and compliance teams in a rapidly scaling hyperscale cloud infrastructure, where disciplined execution and technical depth matter. - Culture built on operational excellence:
Join an organization that values safety, process rigor, clear accountability, and continuous improvement as foundational to protecting uptime and customer trust. - Long-term career development:
Benefit from internal mobility, role-based technical training, and development opportunities designed for professionals building long-term careers in cloud infrastructure and facilities operations.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).