×
Register Here to Apply for Jobs or Post Jobs. X

Remote - Site Reliability Developer ; USC

Remote / Online - Candidates ideally in
Gaithersburg, Montgomery County, Maryland, 20877, USA
Listing for: Ll Oefentherapie
Remote/Work from Home position
Listed on 2026-06-08
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Remote - Site Reliability Developer 3 (USC)

U.S. Citizenship required and eligibility for a Federal Security Clearance Our Team

Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health Data, Analytics Platform. This team will focus on product development and product strategy for Oracle Health, while building out a complete platform supporting modernized, automated healthcare. This is a net new line of business, constructed with an entrepreneurial spirit that promotes an energetic and creative environment. We are unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence.

Oracle Health Data, Analytics Platform has a rare opportunity to play a critical role in how Oracle Health products impact and disrupt the healthcare industry by transforming how healthcare and technology intersect.

You will have the opportunity to:

  • Reach billions of people with our products & services
  • Create technology in which truly impacts the world
  • Ability to have immediate impact on developing technology
  • Unlimited growth potential with inspiring work
  • Work with the best minds in the industry
  • Enjoy working in an open, diverse, and productive environment
About The Job

This role provides support to core data platforms behind Oracle Health’s Data & Analytics Platform. As a Senior Site Reliability Engineer (SRE), you will own shared, mission‑critical systems used by multiple products and teams.

You will work on the design and operation of large‑scale, stateful distributed platforms, including Hadoop ecosystem components (HDFS, YARN, HBase) deployed on Oracle Big Data Service (BDS), Kafka, and Storm. These multi‑tenant platforms are deployed and operated through Ansible‑ and Terraform‑based automation and require strong architectural ownership to manage scale, change, and broad blast radius.

What You’ll Do Platform Ownership & Technical Leadership
  • Own the end‑to‑end reliability, scalability, and operability of shared data platforms
  • Define platform standards, architectural direction, and operational guardrails
  • Influence cross‑team technical decisions and long‑term platform strategy
  • Drive long‑term platform evolution and influence reliability strategy across the data ecosystem
Architecture & Design
  • Clearly articulate system behavior, dependencies, and failure modes
  • Make principled trade‑offs between reliability, performance, cost, and complexity
  • Provide guidance and guardrails that enable downstream teams to use platforms safely and effectively
Operations Engineering
  • Establish capacity models, scaling strategies, and operational best practices
  • Design platforms that behave predictably under load, failure, and change
  • Own platform lifecycle events: upgrades, expansions, decommissioning, and recovery
Distributed Systems Expertise
  • Operate and evolve stateful distributed systems where data placement, replication, and recovery are critical
  • Reason about failure modes such as back pressure, rebalancing, region movement, replication lag, and rolling upgrades
Security
  • Operate and maintain Kerberized platforms, including authentication, authorization, and secure service‑to‑service communication
  • Treat security as a first‑class architectural concern
Automation
  • Design and evolve an Ansible‑ and Terraform‑driven automation framework
  • Treat automation as production software: versioned, reviewed, tested, and improved
  • Eliminate operational toil by encoding reliability and safety into the platform
Incident Leadership & Prevention
  • Serve as the ultimate escalation point for complex or ambiguous incidents
  • Focus on eliminating entire classes of failure, not just resolving individual issues
Representation
  • Represent SRE and platform engineering in high‑visibility and sensitive forums
  • Communicate clearly with engineering leadership and partner teams
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary