Site Reliability Developer
Listed on 2026-01-07
-
IT/Tech
Cloud Computing, Systems Engineer, IT Support
Site Reliability Developer 3
Oracle Cloud Infrastructure (OCI) – OCI National Security Regions
Locations:
Reston, VA | Seattle, WA | Austin, TX |
Job Description
OCI builds the future of the cloud for enterprises, combining the speed and agility of a start‑up with the scale and customer focus of a leading enterprise software company. We are focused on equity, inclusion, respect, learning, and continuous growth. You will join a talented, diverse team that owns end‑to‑end platform reliability and drives automation, security, and performance at scale.
Responsibilities- System Design and Operation
- Design and manage distributed Unix‑based systems, primarily Oracle Linux.
- Implement auto‑scaling and self‑healing infrastructure for high uptime and durability.
- Tune kernel, networking, and file system parameters for optimal performance.
- Maintain OS patching and compliance across environments.
- Integrate with enterprise identity services (Active Directory, LDAP, Kerberos).
- Automation & Infrastructure as Code
- Develop and maintain automation using Ansible and Terraform.
- Automate deployment pipelines, service configuration, and patch management.
- Write Python and Bash scripts to enhance infrastructure delivery.
- Extend platform APIs and automation for repeatability and efficiency.
- Observability & Incident Response
- Build observability stacks with Prometheus, Grafana, and other telemetry tools.
- Create dashboards and SLO/SLI‑based alerts for real‑time monitoring.
- Participate in global 24/7 on‑call rotation and lead high‑severity incident responses.
- Conduct post‑incident RCA and drive long‑term reliability improvements.
- Collaboration & Standards
- Partner with development teams to embed reliability in deployment pipelines.
- Define system architecture standards and maintain robust platform documentation.
- Mentor engineers on Unix performance, observability, and debugging practices.
- Champion a culture of automation, resilience, and continuous improvement.
- U.S. Government TS/SCI with Polygraph.
- U.S. Citizenship – Federal Government customer.
- Bachelor’s or Master’s degree in CS or related engineering field.
- 5+ years of experience in software development or IT operations.
- Deep expertise with Unix/Linux systems, especially Oracle Linux.
- Kernel tuning, performance profiling, and debugging complex system issues.
- Proficiency in Python and Bash scripting.
- Strong grasp of IaC tools such as Ansible and Terraform.
- Experience with hybrid infrastructure (on‑prem VMware, containers, Kubernetes).
- Hands‑on experience with monitoring, telemetry, and observability stacks.
- Excellent problem‑solving, communication, and collaboration skills.
- Self‑motivated and able to work independently in a distributed environment.
- Container virtualization (Docker, Kubernetes) experience.
- Continuous integration platforms (Jenkins) experience.
- Monitoring and alerting technologies (Prometheus, Grafana).
- Postgre
SQL experience, including replication, failover, and backups. - Git experience.
Base salary range: $79,100 – $158,200 per year (Reston, VA; Seattle, WA; Austin, TX). May be eligible for bonus and equity.
Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, and protected veteran status. All qualified applicants with exemptions from COVID‑19 related immunization and occupational health mandates will be considered on a case‑by‑case basis.
Certain U.S. customer or client‑facing roles may require compliance with applicable requirements such as immunization and occupational health mandates.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).