More jobs:
Engineer, Inference & Model serving
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-02
Listing for:
techire ai
Full Time
position Listed on 2026-06-02
Job specializations:
-
Software Development
AI Engineer, Machine Learning/ ML Engineer
Job Description & How to Apply Below
ML Model Serving Engineer
Want to build the layer that actually makes AI usable in real time?
You'll join a team focused on inference, where performance is the product. This is about delivering low-latency, high-throughput systems across LLMs, speech, and vision models running in production, not offline experiments.
They're building real-time AI systems that need to respond instantly, reliably, and t means solving hard problems around batching, GPU efficiency, memory constraints, and system-level bottlenecks that most teams never fully crack.
You'll sit at the core of the platform, working across model serving, infrastructure, and performance optimisation. A big part of the role is pushing current tooling beyond its limits, extending frameworks, profiling bottlenecks, and designing systems that hold up under real-world load.
This is not about training models. It's about making them fast, efficient, and production-ready.
What you'll work on:
- Building high-performance serving systems for LLM, speech, and vision models
- Scaling inference to production workloads with strict latency requirements
- Optimising GPU utilisation and execution efficiency
- Implementing techniques like continuous batching, KV cache optimisation, speculative decoding, and prefill/decode separation
- Improving frameworks such as vLLM, Tensor
RT-LLM, Triton, and SGLang - Profiling and debugging performance across GPU, memory, and system layers
- Strong experience with ML inference or model serving systems
Copilot Symbol
Access Evo Actions
Engineer, Inference & Model serving
Sesame AI
Job
Applications
57
Shortlisted
4
Sent
11
1st Interview
13
2nd+ Interview
0
Offers
0
Placed
0
Renewal
0
Details Custom Fields Descriptions & Ratings Compensation & Fees Activities Files Onboarding Approval process Shift Setting Integrations
Upload JD
No file chosen
Original document
Job Summary
Public job description
Internal job description
Ratings & Screening questions
Note:
This JD will be posted to job boards; please remember to remove the Company details and Contact information.
Quick Post Job
Job title
Engineer, Inference & Model serving
Job owner:
Marc Powell
Company:
Sesame AI
Contact:
Brown Ryan
Privacy
Only Public Jobs can be shared
Private Public
Apps
Visit the App Store
indeed
Your job will go live on Indeed once it adheres to their quality standards.
For more information on this, please head to our Help Center
Your changes have been saved successfully. - Deep understanding of latency and throughput optimisation in production
- Solid Python and PyTorch skills, plus a systems or performance engineering mindset
- Familiarity with distributed systems and production infrastructure
You'll join a highly technical team with experience across major AI labs and big tech. The environment is pragmatic, focused on solving real performance problems rather than abstract research.
There's real ownership here. You'll help define how next-generation AI systems are served.
Package:
$220,000 - $320,000 base + equity
San Francisco, onsite 3 days per week
If you're interested in working on the part of AI that actually determines whether it works in the real world, this is worth exploring.
All applicants will receive a response.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×