×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer | AI Supercomputing

Job in Palo Alto, Santa Clara County, California, 94306, USA
Listing for: Luma AI
Full Time position
Listed on 2026-01-12
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 200000 - 250000 USD Yearly USD 200000.00 250000.00 YEAR
Job Description & How to Apply Below

Site Reliability Engineer | AI Supercomputing

Join to apply for the Site Reliability Engineer | AI Supercomputing role at Luma AI
.

Base pay range: $/yr - $/yr.

The Opportunity

Luma AI is building the engine for multimodal general intelligence. To teach models to understand the world through video, audio, and images, we operate at the absolute frontier of computing power. We have secured the capital to deploy massive‑scale GPU clusters that rival the world's largest supercomputers, while maintaining the agility of a focused engineering lab. This role places you at the intersection of hardware and software, where you architect the physical and digital foundation of AGI.

Where You Come In

You will serve as a technical authority on the systems that power our research and product velocity. This is a role for a builder who prefers bare metal to managed services and understands that at our scale, standard cloud abstractions break down. You will architect, optimize, and maintain the massive, multi‑vendor GPU supercomputers required to train our foundational models.

What You Will Build

  • Supercomputing Architecture:
    Design and deploy high‑performance clusters combining thousands of GPUs, CPUs, and high‑throughput networking to maximize training efficiency.
  • The Network Layer:
    Optimize low‑level networking (Infini Band, RDMA) to ensure seamless communication between accelerators, eliminating bottlenecks in distributed training jobs.
  • Hardware‑Software Synthesis:
    Collaborate with hardware partners to push the boundaries of what is possible, debugging failures at the intersection of the kernel, driver, and silicon.

The Profile We Are Looking For

  • HPC Authority:
    You possess elite knowledge of high‑performance computing (HPC), including job schedulers and the nuances of GPU architecture.
  • Deep Systems Fluency:
    You are comfortable navigating the Linux terminal to solve complex performance issues, utilizing tools like perf and strace to optimize at the OS level.
  • First‑Principles Engineering:
    You have a history of building infrastructure from the ground up, demonstrating the ability to design systems where no playbook currently exists.

Seniority level

Mid‑Senior level

Employment type

Full‑time

Job function

Engineering and Information Technology

Referrals increase your chances of interviewing at Luma AI by 2x.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary