Software Developer, Systems Engineer, Cloud Computing
Job in
Santa Fe, Santa Fe County, New Mexico, 87501, USA
Listed on 2026-06-14
Listing for:
Oracle
Full Time
position Listed on 2026-06-14
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing: Infrastructure & Operations
Job Description & How to Apply Below
** Job Description*
* The Oracle Cloud Infrastructure (OCI) team offers the opportunity to build and operate massive-scale, integrated cloud services in a broadly distributed, multi-tenant cloud environment. OCI builds cloud products for customers who are tackling some of the world's largest technical and business challenges.
Oracle Kubernetes Engine (OKE) is OCI's managed Kubernetes service. OKE enables customers to create, run, scale, secure, and operate Kubernetes clusters on OCI, integrating Kubernetes with OCI compute, networking, storage, identity, observability, security, and automation. The OKE team owns a highly available 24x7 cloud service and is expanding the platform to support larger clusters, higher scale, improved operability, deeper OCI integrations, and increasingly demanding cloud native, AI, and GPU workloads.
We are looking for a senior IC5 software engineer with deep Kubernetes expertise, required cloud infrastructure experience, and a strong distributed systems background. This is a high-impact technical leadership role for an engineer who can define architecture, drive cross-team execution, solve ambiguous production and platform problems, and deliver durable systems that improve both customer experience and operational excellence.
You will work on core OKE platform capabilities including cluster lifecycle management, orchestration, scalability, reliability, performance, automation, observability, security, and integration with OCI infrastructure services. The ideal candidate has hands-on experience designing, building, operating, or deeply debugging production cloud services, infrastructure platforms, or Kubernetes-based systems at meaningful scale.
This role requires advanced Kubernetes experience, including Kubernetes control plane behavior, controllers and operators, scheduling, autoscaling, networking, storage, service discovery, container runtimes, node lifecycle, Kubernetes APIs, and etcd.
Experience with Kubernetes networking and storage technologies such as CNI, Cilium, Calico, Flannel, other container networking implementations, CSI drivers, and cloud provider integrations is highly relevant.
OKE is also expanding to support demanding AI and accelerated computing use cases.
Experience with AI/ML infrastructure, multi-node GPU clusters, accelerated compute, model training or inference platforms, GPU scheduling, device plugins, Karpenter, cluster autoscaling, CUDA, NCCL, RoCE, Infini Band, RDMA, Smart
NIC/DPU offload, or high-performance AI/HPC networking is a significant plus.
This role also requires an engineer who is ready to use modern agentic engineering practices responsibly. We expect senior engineers to apply AI-assisted and agentic workflows to accelerate design exploration, implementation, testing, debugging, documentation, operational analysis, and developer productivity while maintaining strong ownership, security judgment, code quality, and production accountability.
** Responsibilities*
* As a member of the software engineering division, you will take an active role in defining and evolving standard practices and procedures. You will define specifications for significant new projects and specify, design, develop, troubleshoot, and debug software for OCI's managed Kubernetes service.
Responsibilities include:
+ Provide technical leadership for major OKE platform initiatives from architecture through implementation, launch, and production operation.
+ Design and build distributed systems that create, update, scale, repair, and operate Kubernetes clusters across OCI regions.
+ Improve OKE reliability, scalability, performance, upgrade safety, lifecycle management, observability, automation, and operational tooling.
+ Work deeply with Kubernetes technologies, including control plane components, controllers/operators, scheduling, autoscaling, Kubernetes APIs, container runtimes, node behavior, and etcd.
+ Design, debug, and improve Kubernetes networking and storage integrations, including CNI-based networking, Cilium, Calico, Flannel, other container networking implementations, CSI drivers, and OCI infrastructure integrations.
+ Build automation for cluster…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×