×
Register Here to Apply for Jobs or Post Jobs. X

Cloud Inference Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Slope
Full Time position
Listed on 2026-03-07
Job specializations:
  • Engineering
    AI Engineer, Systems Engineer, Software Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Qualifications

  • CUDA + GPU inference optimization
  • vLLM, SGLang, or Tensor

    RT-LLM experience
  • KV caching, paged attention, batching, token streaming, etc.
  • Distributed compute (with GPUs is a super plus)
  • No degree required
Company

Luminal (YC S25) builds an AI compiler and serving stack that makes models 10x faster and production ready with one line.

Role

Founding, on site in downtown SF. Ship low latency, high throughput model serving on Luminal Cloud.

Day To Day Responsibilities
  • Deploy and tune models with optimizations like KV caching, paged attention, sequence packing, etc.
  • Conducting model performance reviews
  • Improve scheduler, batcher, autoscaling; profile latency, cost, utilization
  • Sometimes write kernels and, yes, occasional tasteful shitposting
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary