×
Register Here to Apply for Jobs or Post Jobs. X

Member of Technical Staff, Kernels

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Acceler8 Talent
Full Time position
Listed on 2026-06-24
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Hardware Engineer, IT Infrastructure
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Member of Technical Staff — Kernels & GPU Performance

Employment Type: Full-time

Workplace: On-site

About the Company

We are building the execution layer for the next era of AI infrastructure.

As AI workloads scale and hardware architectures diversify, the bottleneck is no longer just access to more GPUs. The harder problem is making many kinds of compute work together efficiently, reliably, and at production scale.

Our platform intelligently partitions, schedules, and routes AI workloads across heterogeneous hardware environments, giving customers production-grade APIs without requiring them to manage hardware selection, placement, or low-level optimization.

We work with leading AI labs, hyperscalers, and AI-native companies running some of the most demanding workloads in the world.

About the Role

Every early hire changes the trajectory of the company.

As an early member of the engineering team, you will help define the systems, standards, and culture behind a new class of AI infrastructure. This is a high-ownership role for someone who wants to work close to the metal and turn theoretical hardware performance into real-world production gains.

You will design, optimize, and validate kernels that power large-scale AI workloads across both established and emerging accelerator architectures. Your work will sit at the intersection of kernels, runtimes, compilers, distributed systems, and hardware execution.

This is not a traditional GPU optimization role.

You will be helping build software that extracts maximum performance from increasingly diverse compute environments, where latency, throughput, memory efficiency, correctness, and utilization directly shape the economics of AI infrastructure.

What You’ll Do

In your first 12‑18 months, you will:

  • Build and optimize kernels that improve latency, throughput, and hardware utilization for production AI workloads
  • Develop execution strategies that unlock performance across established and emerging accelerator architectures
  • Improve memory efficiency, scheduling behavior, and execution characteristics across the inference stack
  • Partner with compiler, runtime, and distributed systems engineers to optimize end-to-end performance
  • Help define how heterogeneous hardware is deployed, scheduled, and utilized at datacenter scale
  • Establish performance engineering practices that influence the long-term direction of the execution platform
You May Be a Fit If You Have
  • Strong software engineering fundamentals
  • Experience building or optimizing performance‑critical systems close to hardware
  • Comfort reasoning about execution behavior, memory hierarchies, scheduling, and performance tradeoffs
  • A bias toward measurement, profiling, and rigorous validation
  • The ability to work across abstraction layers, from kernels to production systems
Strong Candidates May Also Have
  • Experience with CUDA, Triton, CUTLASS, HIP, ROCm, or other accelerator programming models
  • Deep understanding of GPU execution models, including warps or wave fronts, blocks, grids, occupancy, and latency hiding
  • Experience optimizing memory access patterns, including coalescing, shared memory usage, cache behavior, and bandwidth utilization
  • Familiarity with instruction‑level parallelism and low-level performance tuning
  • Experience using profiling and performance analysis tools
  • Familiarity with multi‑GPU, distributed execution, or large‑scale inference systems
Why This Role Matters

Most AI infrastructure companies are focused on acquiring more compute.

We are focused on making every unit of compute more useful.

The next decade of AI will be defined not only by new hardware, but by the software systems that determine how effectively that hardware is used. Kernels, runtimes, and execution systems built today will shape how AI workloads run across datacenters for years to come.

As an early engineer, you will have significant ownership, work alongside deeply technical teammates, and help build the infrastructure layer that enables the next generation of AI systems.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary