Inference engineer
Listed on 2026-05-24
-
Engineering
Systems Engineer
We're looking for engineers who can bridge the gap between ML research and high-performance inference.
You'll work across our inference engine and model conversion toolkit, implementing new model architectures, supporting new modalities, writing optimized kernels, and building a wide range of features such as function calling and batch decoding.
This role is ideal for someone who reads papers for fun, enjoys writing high-performance code, and gets excited about constant learning.
Nobody knows everything. We'd rather you know one area deeply than everything superficially. If you're good at least in a couple of these areas, you're a great fit:
- JAX / Equinox / Pallas stack
- Rust systems programming with a focus on developer experience
- Writing Metal / Vulkan kernels
- Neural codecs and voice model architectures
- Trellis-based quantization approaches
- Advanced speculative decoding methods, such as EAGLE
- Deep understanding of Transformer / SSM / Diffusion / Vision language models
- Benchmarking inference performance and model quality
- Strong linear algebra, optimization methods, and probability theory
And of course, basic engineering skills, we will ship a lot of code
We welcome applications from students and early-career engineers. If you've participated in projects that demonstrate systems thinking and ML understanding, we want to hear from you!
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).