×
Register Here to Apply for Jobs or Post Jobs. X

AI Runtime Engineer

Job in Santa Clara, Santa Clara County, California, 95053, USA
Listing for: FlexAI
Full Time position
Listed on 2026-06-05
Job specializations:
  • Software Development
    AI Engineer, Cloud Engineer - Software
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below
Position: Staff AI Runtime Engineer

Build and Deploy AI the right way, anywhere.

The FlexAI Compute Infrastructure Platform provides an "end-to-end AI compute layer" for running and managing workloads across any cloud, any GPU, and any deployment model (public, hybrid, or on-prem). It brings together "1-click simplicity" for users with "enterprise-grade orchestration, security, and automation" under the hood.

Founded by Brijesh Tripathi, who brings experience from Nvidia, Apple, Tesla, Intel and Zoox, FlexAI is not just building a product – we’re shaping the future of AI. Our teams are strategically distributed across Silicon Valley and Bengaluru, united by a shared mission: to deliver more compute with less complexity.

If you're passionate about shaping the future of artificial intelligence, driving innovation, and contributing to a sustainable and inclusive AI ecosystem,
FlexAI is the place for you !

Role Overview

At FlexAI, we’re building a high-performance, cloud-agnostic AI compute platform designed for next-generation training and inference workloads. As a Staff AI Runtime Engineer
, you’ll play a pivotal role in the design, development, and optimization of the core runtime infrastructure that powers distributed training and deployment of large AI models (LLMs and beyond).

This is a hands‑on leadership role – perfect for a systems‑minded software engineer who thrives at the intersection of AI workloads, runtimes, and performance‑critical infrastructure. You’ll own critical components of our PyTorch-based stack, lead technical direction, and collaborate across engineering, research, and product to push the boundaries of elastic, fault‑tolerant, high‑performance model execution.

What You’ll DoLead Runtime Design & Development:
  • Own the core runtime architecture supporting AI training and inference at scale.
  • Design resilient and elastic runtime features (e.g. dynamic node scaling, job recovery) within our custom PyTorch stack.
  • Optimize distributed training reliability, orchestration, and job-level fault tolerance.
Drive Performance at Scale:
  • Profile and enhance low‑level system performance across training and inference pipelines.
  • Improve packaging, deployment, and integration of customer models in production environments.
  • Ensure consistent throughput, latency, and reliability metrics across multi‑node, multi‑GPU setups.
Build Internal Tooling & Frameworks:
  • Design and maintain libraries and services that support model lifecycle: training, checkpointing, fault recovery, packaging, and deployment.
  • Implement observability hooks, diagnostics, and resilience mechanisms for deep learning workloads.
  • Champion best practices in CI/CD, testing, and software quality across the AI Runtime stack.
  • Work cross-functionally with Research, Infrastructure, and Product teams to align runtime development with customer and platform needs.
  • Guide technical discussions, mentor junior engineers, and help scale the AI Runtime team’s capabilities.
What You’ll Need to Be Successful
  • 8+ years of experience in systems/software engineering, with deep exposure to AI runtime, distributed systems, or compiler/runtime interaction.
  • Proven experience optimizing and scaling deep learning runtimes (e.g. PyTorch, Tensor Flow, JAX) for large‑scale training and/or inference.
  • Strong programming skills in Python and C++ (Go or Rust is a plus).
  • Familiarity with distributed training frameworks, low‑level performance tuning, and resource orchestration.
  • Experience working with multi‑GPU, multi‑node, or cloud-native AI workloads.
  • Solid understanding of containerized workloads, job scheduling, and failure recovery in production environments.
Nice to Have
  • Contributions to PyTorch internals or open-source DL infrastructure projects.
  • Familiarity with LLM training pipelines, checkpointing, or elastic training orchestration.
  • Experience with Kubernetes, Ray, Torch Elastic, or custom AI job orchestrators.
  • Background in systems research, compilers, or runtime architecture for HPC or ML.
  • Start‑up previous experience.

This position is In‑Person and located at our Santa Clara, CA Office.

What We Offer
  • A competitive salary and benefits package
  • Work on cutting‑edge AI infrastructure
  • Build products used by developers and enterprises
  • High ownership, fast execution, real impact
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary