Principal Site Reliability Engineer
Listed on 2026-02-16
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
About Oracle Cloud
Oracle Cloud is a comprehensive suite of cloud services—including infrastructure, platform, and applications—designed to help organizations build, deploy, and manage workloads securely Oracle, we are building the most intelligent future of cloud computing. Our team is composed of talented, motivated, and diverse individuals committed to empowering our customers to accomplish their most important missions using Oracle Cloud Fusion Applications. We center our work around our customers’ needs, striving to continuously enhance our cloud capabilities based on their challenges.
Aboutthe Team
Join the Fusion Site Reliability Engineering Middleware (FSRE-MW)—a critical group dedicated to maintaining the high availability of Oracle’s Cloud Fusion Applications. We minimize the frequency and duration of customer-impacting events through large-scale incident management and automation. As a team, we combine the agility of a start-up with the scale and customer focus of a leading enterprise software company.
As a Principal Site Reliability Engineer, you will be a key member of a high-impact team focused on the availability, performance, and operational excellence of Fusion SRE Middleware. You will take ownership of production environments—including systems and the Fusion Middleware stack—and support mission-critical business operations for Cloud Fusion Applications. Your role will emphasize automation and optimization of operations across multiple production environments, recommending AI-driven solutions to enhance availability, performance, and supportability.
You will harness AI-based tools and predictive analytics to proactively identify issues, automate incident responses, and continuously improve system resilience. Additionally, you will provide escalation support for complex production problems, guide junior engineers, participate in major incident bridges, and help build and refine processes and procedures using AI-powered insights to drive smarter, data-driven decisions.
Our team is front-and-center in reducing event duration, leveraging operational experience, best practices, and tool development to automate incident management and drive continual improvement.
About the RoleWe seek a Principal SRE to join our globally distributed team, responsible for detecting, triaging, and mitigating service-impacting events rapidly and effectively through automation and AI-powered insights. You will be part of a regional team, minimizing Fusion services’ downtime through exceptional incident management and system operations, with a strong emphasis on scalability, performance, security, and AI-driven optimization. In this dynamic role, you will gain deep insight into the inner workings of Oracle Cloud Fusion Apps, using AI tools to predict, identify, and address potential issues before they impact services.
You’ll influence cross-functional leaders and drive programs that boost service availability while leveraging AI to enhance real-time decision-making and improve operational efficiency.
Career Level:IC4
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).