Sr Principal AI Software Engineer - ML & AI Innovation
Listed on 2026-02-16
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer
The Senior Principal AI/ML Software Engineer is responsible for evaluating, integrating, and optimizing cutting-edge technologies for AI/ML infrastructure, focusing on achieving low latency, high throughput, and efficient resource utilization for both model training and inference s role guides key strategic decisions related to Oracle Cloud’s AI infrastructure offerings, spearheads the design and implementation of scalable orchestration for AI/ML workloads—incorporating the latest research in generative AI and large language models—and leads initiatives such as Retrieval-Augmented Generation and model fine-tuning.
The ideal candidate will design and develop scalable, GPU-accelerated AI services using tools like Kubernetes and Python/Go, and must possess strong programming skills, deep expertise in deep learning frameworks, containerization, distributed systems, and parallel computing, along with a comprehensive understanding of end-to-end AI/ML workflows.
- Evaluate, Integrate, and Optimize state-of-the-art technologies across the stack, for latency, throughput, and resource utilization for training and inference workloads.
- Guide strategic decisions around Oracle Cloud’s AI Infra offerings
- Design and implement scalable orchestration for serving and training AI/ML models, Model Parallelism & Performance across the AI/ML Stack
- Explore and incorporate contemporary research on generative AI, agents, and inference systems into the LLM software stack.
- Lead initiatives in Generative AI systems design, including Retrieval-Augmented Generation (RAG) and LLM fine-tuning
- Design and develop scalable services and tools to support GPU-accelerated AI pipelines, leveraging Kubernetes, Python/Go, and observability frameworks.
- Bachelor’s, Master’s, or Ph.D. in Computer Science, Engineering, Machine Learning, or a related field (or equivalent experience).
- Experience with Machine Learning and Deep Learning concepts, algorithms and models
- Proficiency with orchestration and containerization tools like Kubernetes, Docker, or similar.
- Expertise in modern container networking and storage architecture.
- Expertise in orchestrating, running, and optimizing large-scale distributed training/inference workloads
- Have deep understanding of AI/ML workflows, encompassing data processing, model training, and inference pipelines.
- Experience with parallel computing frameworks and paradigms.
- Strong programming skills and proficiency in major deep learning frameworks.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).