Senior Site Reliability Engineer
Listed on 2026-01-26
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability
Overview
Senior Site Reliability Engineer (SRE) responsible for ensuring the reliability, performance, and scalability of cloud-based AI applications for OCI Operations. Collaborates with development, operations, and security teams to automate processes, develop SRE standards, monitor system health, and maintain uptime for critical AI applications. Designs, automates, and maintains AI services supporting mission-critical AI and ML initiatives.
Responsibilities- Design, implement, and maintain scalable, secure cloud infrastructure for AI applications on OCI
- Collaborate with Engineering teams to build robust automation to deploy, scale, and operate resilient systems
- Implement SRE best practices: SLO/SLI definition, error budgeting, automated monitoring, data integrity validation, and incident response for services
- Identify opportunities and own automation and continuous improvement to run highly scalable, reliable systems
- Design and optimize highly available services resilient to failures or impacts
- Automate infrastructure provisioning and CI/CD deployments with Terraform, Ansible, or other IaC frameworks
- Instrument and monitor components for performance, availability, resource consumption, and latency using observability tools (e.g., Grafana, Prometheus)
- Troubleshoot and resolve complex issues, perform root cause analyses and post-incident reviews
- Solve complex problems related to infrastructure cloud services and automate common tasks to ensure continuous availability
- Utilize an understanding of cloud design patterns and dependencies to mitigate major incidents
- Advocate for and implement security, governance, and compliance best practices
- Mentor team members and promote knowledge sharing around SRE practices and standards
- Bachelor’s or Master’s in Computer Science, Engineering, or related field
- 6+ years of experience in cloud engineering, SRE, or Dev Ops with at least 4 years supporting mission-critical systems
- Experience building high-performance, resilient, scalable systems
- Practical experience designing and operating large-scale cloud-based distributed applications
- Strong hands-on skills with infrastructure-as-code (Terraform), automation (Python/Scala), and containerization (Kubernetes, Docker)
- Familiarity with AI capabilities including LLM, RAG, and AI Agents
- Working knowledge of distributed storage, data formats (Parquet, Avro), and modern analytics platforms
- Solid understanding of networking, cloud security, and compliance
- Strong analytical, troubleshooting, and communication skills
- Disaster recovery, redundancy, and uptime planning experience
- Experience with agile software development
- Preferred certifications: SRE, Cloud Architect/Engineer (OCI, AWS, Azure, GCP), Dev Ops
- Resourcefulness under unique constraints
- Commitment to continuous productivity and effectiveness
- Ability to identify and automate toil tasks
- General problem solving, critical thinking, and attention to detail
- Eagerness to learn and to teach
Certain US customer or client-facing roles may require compliance with applicable requirements, such as immunization and occupational health mandates.
Range and benefit information provided are specific to stated locations. US:
Hiring range in USD from $74,900 to $158,200 per year. May be eligible for bonus and equity. Oracle maintains broad salary ranges to account for variations in knowledge, skills, experience, market conditions, locations, and product lines.
Oracle offers a comprehensive benefits package including medical, dental, vision insurance; disability coverage; life insurance; flexible spending accounts; 401(k) with company match; paid time off; holidays; parental and adoption benefits; stock purchase plan; financial planning; and voluntary benefits.
About UsOracle is a world leader in cloud solutions. We are committed to an inclusive workforce and to providing accessibility accommodations upon request. Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, or protected veteran status.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).