Principal Member of Technical Staff,AI Infrastructure Job Montpelier area,Vermont USA,Software Development

Principal Member of Technical Staff, AI Infrastructure

Austin, TX, United States

Job

Category:
Product Development

Posting Date: 05/13/2026

Type:
Regular Employee

Security Clearance:
No

Years of

Experience:

6 to 10+ years

Languages:

English (read, write, speak)

Job Description

OCI is building the world’s largest AI clusters and is leading the effort by creating a GPU-focused cloud. This role involves creating systems that allow customers to scale from tens to thousands of GPUs while maintaining performance.

We need a highly skilled distributed systems engineer to scale and optimize AI infrastructure components like GPU control plane and GPU data plane, providing computing resources for AI workloads. The engineer will provide technical leadership, clarify ambiguous problems, and drive innovative solutions in collaboration with cross‑functional teams.

Responsibilities

Design and develop scalable AI compute infrastructure, including GPU control plane and GPU data plane, to optimize customer experience and workload performance.
Develop “best‑in‑class” AI compute infrastructure that is modular, secure, reliable, diagnosable, monitored, compliant, and reusable.
Collaborate with development, operations, and product teams to understand requirements and design orchestration solutions.
Mentor junior developers and promote modern software engineering practices: data telemetry for decisions, well‑defined interfaces, design reviews, coding standards, code reviews, unit integration testing, and production monitoring.
Develop benchmark metrics and automation to track performance and reliability across customer workloads and lower infrastructure layers.

Qualifications & Skills

BS (or equivalent) in Computer Science, Engineering, or related field.
6+ years of software development experience with languages such as C, C++, C#, Java, Go, Rust.
3+ years designing and developing large‑scale infrastructure, distributed systems, and services.
1+ year of technical leadership experience, providing clarity to cross‑functional teams.
Systematic problem‑solving, strong communication, sense of ownership, and drive.
Ability to adapt to a fast‑paced, dynamic environment and manage multiple tasks and priorities.

Preferred Qualifications

Experience managing cloud infrastructure with hundreds of thousands of servers.
Experience with Docker and Kubernetes.
Experience scheduling high‑performance workloads on Kubernetes or Slurm.

Compensation & Benefits

US:
Hiring range $96,800–$223,400 per annum. Bonus and equity may apply.

Benefits include medical, dental, vision, disability coverages; life insurance; 401(k) with company match; paid time off; paid holidays; paid sick leave; parental leave; adoption assistance; employee stock purchase plan; financial planning; voluntary benefits.

EEO Statement

Oracle is an Equal Employment Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability, protected veteran status, or any other characteristic protected by law. Oracle will consider qualified applicants with arrest and conviction records pursuant to applicable law.

#J-18808-Ljbffr