Principal Big Data Site Reliability Developer; US REMOTE
Phoenix, Maricopa County, Arizona, 85003, USA
Listed on 2026-05-05
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Job Description
This role requires U.S. Citizenship and eligibility for a Federal Security Clearance.
Our team focuses on product development and strategy for Oracle Health, building a modernized, automated healthcare platform.
About the JobAs Principal Site Reliability Engineer (SRE), you provide technical leadership for core data platforms behind Oracle Health’s Data & Analytics Platform.
Opportunity- Reach billions of people with our products & services
- Create technology that truly impacts the world
- Have an immediate impact on developing technology
- Unlimited growth potential with inspiring work
- Work with the best minds in the industry
- Enjoy working in an open, diverse, and productive environment
- Own the end-to-end reliability, scalability, and operability of shared data platforms
- Define platform standards, architectural direction, and operational guardrails
- Influence cross‑team technical decisions and long‑term platform strategy
- Drive long‑term platform evolution and influence reliability strategy across the data ecosystem
- Lead platform architecture and design reviews
- Clearly articulate system behavior, dependencies, and failure modes
- Make principled trade‑offs between reliability, performance, cost, and complexity
- Provide guidance and guardrails that enable downstream teams to use platforms safely and effectively
- Establish capacity models, scaling strategies, and operational best practices
- Design platforms that behave predictably under load, failure, and change
- Own platform lifecycle events: upgrades, expansions, decommissioning, and recovery
- Operate and evolve stateful distributed systems where data placement, replication, and recovery are critical
- Reason about failure modes such as back pressure, rebalancing, region movement, replication lag, and rolling upgrades
- Operate and maintain Kerberized platforms, including authentication, authorization, and secure service‑to‑service communication
- Treat security as a first‑class architectural concern
- Design and evolve an Ansible‑ and Terraform‑driven automation framework
- Treat automation as production software: versioned, reviewed, tested, and improved
- Eliminate operational toil by encoding reliability and safety into the platform
- Serve as the ultimate escalation point for complex or ambiguous incidents
- Focus on eliminating entire classes of failure, not just resolving individual issues
- Represent SRE and platform engineering in high‑visibility and sensitive forums
- Communicate clearly with engineering leadership and partner teams
The team operates within the Oracle Health Data & Analytics Platform, supporting one of Oracle Health’s core products, Healthe Intent. We operate the big data and streaming infrastructure that enables downstream teams to deliver reliable customer‑facing solutions at scale, while continuously improving operability and efficiency.
Required Experience- 8+ years operating large‑scale, customer‑facing distributed platforms
- Deep experience with HDFS, YARN, HBase, Kafka, Storm, or similar systems
- Strong background in Linux, networking, and distributed system troubleshooting
- Infrastructure‑as‑Code using Ansible and Terraform
- Scripting and automation using Python, Ruby, and Bash
- Hands‑on experience operating Kerberized environments
- Proven ability to define and document technical architecture for complex systems
- Demonstrated ownership of shared platforms with broad blast radius and multiple downstream consumers
- Experience designing observability and capacity models for distributed platforms
- U.S. Citizenship and eligibility for a Federal Security Clearance
- 10+ years of technical experience relevant to this position
- Ability to communicate effectively and build rapport with team members
- BS or MS in Computer Science, or equivalent
IC4
BenefitsSalary range: $86,400 - $199,500 per year. May be eligible for bonus and equity.
- Medical, dental, and vision insurance
- Short‑term and long‑term disability
- Life insurance and AD&D
- Supplemental life insurance…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).