×
Register Here to Apply for Jobs or Post Jobs. X

Remote Principal Site Reliability Developer- USC

Remote / Online - Candidates ideally in
Ann Arbor, Washtenaw County, Michigan, 48103, USA
Listing for: Ll Oefentherapie
Remote/Work from Home position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 86400 - 199500 USD Yearly USD 86400.00 199500.00 YEAR
Job Description & How to Apply Below
Position: [Remote] Principal Site Reliability Developer- USC Required

Overview

Come and join us! Building on our cloud momentum, Oracle has formed a new organization—
Oracle Health
. This team focuses on product deployment, sustainability, troubleshooting, and product strategy while building a modern, automated healthcare platform. This is a net-new line of business with an entrepreneurial spirit, offering a unique opportunity to help build a world-class engineering organization centered on excellence, innovation, and real-world impact.

As a Site Reliability Dev Ops Engineer
, you will play a critical role in operating and scaling a Clinical AI Assistant platform used by healthcare professionals worldwide
. This system is designed to improve the quality, safety, and efficiency of care delivery for billions of patients globally
. Your work will directly influence the reliability and performance of AI-driven systems that clinicians depend on in high-stakes environments.

This role goes beyond traditional SRE responsibilities—you will have the opportunity to leverage AI/ML techniques and develop AIOps solutions to proactively manage system reliability, detect anomalies, automate remediation, and continuously improve service performance. You will help define how reliability engineering evolves in the context of intelligent, AI-powered healthcare systems.

You will be responsible for architecture, production operations, capacity planning, performance management, deployment, and release engineering, working across cross-functional teams to deliver highly reliable, scalable, and secure services.

Responsibilities
  • Own the architecture, design, implementation, and production operations of core platform and AI-driven system services

  • Ensure the reliability, availability, and performance of the Clinical AI Assistant platform used in real-world healthcare settings

  • Build and operate AIOps-driven capabilities (e.g., intelligent alerting, anomaly detection, automated remediation, predictive scaling)

  • Continuously improve systems through automation, self-healing mechanisms, and real-time observability

  • Design and develop software to enhance system scalability, efficiency, and resilience

  • Partner with cross-functional teams to prototype and deliver new platform services

  • Lead efforts in capacity planning, demand forecasting, performance tuning, and cost optimization

  • Solve complex distributed systems challenges in cloud-native environments and prevent recurrence through engineering rigor

  • Contribute to platform engineering best practices, including infrastructure as code, CI/CD, and service reliability standards

  • Stay current with emerging technologies in cloud, distributed systems, and AI/ML-driven operations

Key Requirements / Experience

Must-have:

  • Ability to obtain and maintain a federal security clearance (US citizenship required)

  • 8+ years of experience in Site Reliability Engineering, Dev Ops, or related roles

  • Proven experience operating large-scale, distributed, production systems with high availability requirements

  • Strong experience with container orchestration (Kubernetes, Docker, or similar)

  • Infrastructure as Code expertise (Terraform, Ansible, Chef, Puppet, Packer, etc.)

  • Experience building and operating CI/CD pipelines (Git, Jenkins, Git Lab, Rundeck, etc.)

  • Proficiency in scripting and automation (Bash, Python, Power Shell, etc.)

  • Experience with at least one major cloud provider (OCI, AWS, Azure, etc.)

  • Strong Linux systems expertise

  • Experience with observability tooling (monitoring, logging, tracing) and performance optimization

Nice-to-have:

  • Experience supporting or operating AI/ML or LLM-based systems in production

  • Exposure to AIOps, intelligent automation, or ML-driven observability

  • Experience in healthcare or other regulated environments (HIPAA, security, compliance)

  • Background in high-throughput, low-latency systems supporting mission-critical workloads

  • Software engineering experience in Java, Python, C++, or similar languages

Benefits
  • US:
    Hiring Range in USD from: $86,400 to $199,500 per annum. May be eligible for bonus and equity.

  • Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect…

  • To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
    (If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)
    0
    200
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary