Principal Software Developer
Listed on 2026-01-01
-
IT/Tech
Cloud Computing, Systems Engineer
Job Description AI-Driven Incident Remediation (LARS)
Design and implement new LARS remediation workflows across single-tenant (ST) and multi-tenant (MT) OAC instances.
Expand automated coverage for service health, capacity, cluster availability, and network-related alarms.
Enhance AI-driven diagnostics, triage, pattern detection, and auto-approval pipelines for incident mitigation.
Improve observability, cross-pod dashboarding, and multi-pod coordinated incident support.
Develop and integrate AI-assisted diagnostics and automated mitigation for high-severity production incidents.
Contribute to Agentic Dev Ops initiatives, including autonomous remediation frameworks and prototype agent workflows.
Collaborate with ML teams to incorporate models for anomaly detection, root-cause analysis, and remediation recommendations.
Build tooling and CI/CD pipeline extensions to eliminate manual change processes and streamline deployment safety.
Design guardrails, approval workflows, and automated rollouts to improve release reliability and reduce operational toil.
Develop MCP servers and agent orchestration workflows enabling end-to-end automated diagnostics and incident resolution.
Integrate agent-driven actions with existing automation systems (LARS, CI/CD, service health signals).
Contribute to next-generation self-healing and autonomous operations capabilities across OAC services.
Required Qualifications
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 6 years of experience in Dev Ops, Site Reliability Engineering, or related roles.
- Strong proficiency in Python and Java for automation and system development.
- Experience working with Oracle Cloud Infrastructure (OCI), preferably with Oracle Analytics Cloud or similar Oracle PaaS services.
- Solid understanding of containerization and orchestration (Docker, Kubernetes, Helm).
- Experience with Git-based workflows, artifact repositories, and CI/CD tooling.
- Hands‑on knowledge of Linux/Unix system administration and networking fundamentals.
- Familiarity with monitoring tools (Prometheus, Grafana, ELK stack, OCI Monitoring).
- Ability to work in a fast‑paced, agile environment with a proactive mindset.
- Oracle Cloud certifications (e.g., Oracle Cloud Infrastructure Dev Ops Professional).
- Experience with secure Dev Ops (Dev Sec Ops ) practices.
- Background in analytics platforms or data engineering is a plus.
- Experience contributing to open‑source or internal developer platforms.
- Experience with integrating MCP servers and building RAG based Knowledge Base from semi‑structured documents and logs.
Disclaimer:
Certain US customer or client‑facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
Range and benefit information provided in this posting are specific to the stated locations onlyUS:
Hiring Range in USD from: $96,800 to $251,600 per annum. May be eligible for bonus, equity, and compensation deferral.
Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle’s differing products, industries and lines of business.
Candidates are typically placed into the range based on the preceding factors as well as internal peer equity.
Oracle US offers a comprehensive benefits package which includes the following:
Flexible Vacation is provided to all eligible employees assigned to a salaried (non‑overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).