Founding GPU Compiler Engineer
Listed on 2026-01-06
-
Software Development
AI Engineer, Machine Learning/ ML Engineer
About SF Tensor
At The San Francisco Tensor Company, we believe the future of AI and high-performance computing depends on rethinking the entire software and infrastructure stack. Today's developers face bottlenecks across hardware, cloud, and code optimization that slow progress before ideas can reach their full potential. Our mission is to remove those barriers and make compute faster, cheaper, and universally portable.
We are building a Kernel Optimizer that automatically transforms code into its most efficient form, combined with Tensor Cloud for adaptive, cross-cloud compute and Emma Lang, a new programming language for high-performance, hardware-aware computation. Together, these technologies reinvent the foundations of AI and HPC.
SF Tensor is proudly backed by Susa Ventures and Y Combinator, as well as a group of angels including Max Mullen and Paul Graham as well as founders and executives of Neura Link, Notion and AMD. We are partnering with researchers, engineers, and organizations who share our belief that the next breakthroughs in AI require breakthroughs in compute.
About the RoleWe're hiring a Founding GPU Compiler Engineer to build the core compilation infrastructure for our AI compiler. That means taking models from PyTorch, JAX, and Tensor Flow and turning them into highly optimized binaries for large-scale AI pre-training.
You'll own the entire compiler stack, from ingesting Stable
HLO all the way to backend code generation, and you'll work across targets like NVIDIA, AMD, Trainium, and TPU. You'll help shape our architecture, tooling, and overall engineering culture from the very beginning.
- Design and implement the main compilation pipeline, from Stable
HLO to executable GPU and host binaries - Build and extend MLIR dialects and passes to optimize AI workloads
- Develop backend code generation for multiple targets (NVIDIA PTX/SASS, AMD GCN/RDNA, Trainium, TPU)
- Implement classic compiler optimizations customized for large-scale training (fusion, tiling, memory planning, scheduling)
- Build search-based compiler infrastructure to explore different optimization options
- Create hybrid codegen paths for cases where direct MLIR lowering isn't practical
- Set up testing, benchmarking, and performance regression systems
- Work closely with ML researchers to understand workload characteristics and find optimization opportunities
- Deep experience with compiler infrastructure (LLVM, MLIR, or similar)
- Strong background in GPU architecture and low-level optimization (CUDA, ROCm, or equivalent)
- Hands-on experience with at least one of: PTX/SASS, GCN/RDNA assembly, or other GPU ISAs
- Familiarity with ML compiler stacks (XLA, TVM, Triton, torch.compile, or similar)
- Solid systems programming skills in C++ and/or Rust
- Proven track record of building production-grade compiler infrastructure
- Background in distributed systems or multi-device compilation
- Contributions to open-source compiler projects
- Experience with autotuning or search-based optimization
- Familiarity with large-scale training infrastructure
- Experience with (Stable)
HLO
You'll be one of the first engineers defining how we compile and optimize AI workloads. It's a rare chance to build a compiler stack from the ground up, with a direct impact on the efficiency of large-scale AI training.
We believe in the power of in-person collaboration to solve the hardest problems and foster a strong team culture. We offer relocation assistance and look forward to you joining us in our San Francisco office.
The base salary range for this full-time position is $285,000 - $315,000 + bonus + equity + benefits.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).