Member of Technical Staff,Kernels Job San Francisco area,California USA,IT/Tech

Member of Technical Staff — Kernels & GPU Performance

Employment Type: Full-time

Workplace: On-site

About the Company

We are building the execution layer for the next era of AI infrastructure.

As AI workloads scale and hardware architectures diversify, the bottleneck is no longer just access to more GPUs. The harder problem is making many kinds of compute work together efficiently, reliably, and at production scale.

Our platform intelligently partitions, schedules, and routes AI workloads across heterogeneous hardware environments, giving customers production-grade APIs without requiring them to manage hardware selection, placement, or low-level optimization.

We work with leading AI labs, hyperscalers, and AI-native companies running some of the most demanding workloads in the world.

About the Role

Every early hire changes the trajectory of the company.

As an early member of the engineering team, you will help define the systems, standards, and culture behind a new class of AI infrastructure. This is a high-ownership role for someone who wants to work close to the metal and turn theoretical hardware performance into real-world production gains.

You will design, optimize, and validate kernels that power large-scale AI workloads across both established and emerging accelerator architectures. Your work will sit at the intersection of kernels, runtimes, compilers, distributed systems, and hardware execution.

This is not a traditional GPU optimization role.

You will be helping build software that extracts maximum performance from increasingly diverse compute environments, where latency, throughput, memory efficiency, correctness, and utilization directly shape the economics of AI infrastructure.

What You’ll Do

In your first 12‑18 months, you will:

Build and optimize kernels that improve latency, throughput, and hardware utilization for production AI workloads
Develop execution strategies that unlock performance across established and emerging accelerator architectures
Improve memory efficiency, scheduling behavior, and execution characteristics across the inference stack
Partner with compiler, runtime, and distributed systems engineers to optimize end-to-end performance
Help define how heterogeneous hardware is deployed, scheduled, and utilized at datacenter scale
Establish performance engineering practices that influence the long-term direction of the execution platform

You May Be a Fit If You Have

Strong software engineering fundamentals
Experience building or optimizing performance‑critical systems close to hardware
Comfort reasoning about execution behavior, memory hierarchies, scheduling, and performance tradeoffs
A bias toward measurement, profiling, and rigorous validation
The ability to work across abstraction layers, from kernels to production systems

Strong Candidates May Also Have

Experience with CUDA, Triton, CUTLASS, HIP, ROCm, or other accelerator programming models
Deep understanding of GPU execution models, including warps or wave fronts, blocks, grids, occupancy, and latency hiding
Experience optimizing memory access patterns, including coalescing, shared memory usage, cache behavior, and bandwidth utilization
Familiarity with instruction‑level parallelism and low-level performance tuning
Experience using profiling and performance analysis tools
Familiarity with multi‑GPU, distributed execution, or large‑scale inference systems

Why This Role Matters

Most AI infrastructure companies are focused on acquiring more compute.

We are focused on making every unit of compute more useful.

The next decade of AI will be defined not only by new hardware, but by the software systems that determine how effectively that hardware is used. Kernels, runtimes, and execution systems built today will shape how AI workloads run across datacenters for years to come.

As an early engineer, you will have significant ownership, work alongside deeply technical teammates, and help build the infrastructure layer that enables the next generation of AI systems.

#J-18808-Ljbffr

Member of Technical Staff, Kernels