More jobs:
Cloud Inference Engineer
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-03-07
Listing for:
Slope
Full Time
position Listed on 2026-03-07
Job specializations:
-
Engineering
AI Engineer, Systems Engineer, Software Engineer
Job Description & How to Apply Below
Qualifications
- CUDA + GPU inference optimization
- vLLM, SGLang, or Tensor
RT-LLM experience - KV caching, paged attention, batching, token streaming, etc.
- Distributed compute (with GPUs is a super plus)
- No degree required
Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.
RoleFounding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.
Day To Day Responsibilities- Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
- Conducting model performance reviews
- Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
- Sometimes write kernels and, yes, occasional tasteful shitposting
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×