×
Register Here to Apply for Jobs or Post Jobs. X

Machine Learning Engineer

Job in San Francisco, San Francisco County, California, 94199, USA
Listing for: Scouto AI
Full Time position
Listed on 2026-02-25
Job specializations:
  • Software Development
    Machine Learning/ ML Engineer, AI Engineer
Salary/Wage Range or Industry Benchmark: 180000 - 250000 USD Yearly USD 180000.00 250000.00 YEAR
Job Description & How to Apply Below
Overview

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute for running large-language models like Deep Seek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network. We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems.

We primarily work in-person from our office in downtown San Francisco.

Responsibilities

• Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models

• Deploy and maintain large language models at scale in production environments

• Deploy new models as they are released by frontier labs

• Implement techniques like quantization, speculative decoding, and KV cache reuse

• Contribute regularly to open source projects such as SGLang and vLLM

• Deep dive into underlying codebases of Tensor

RT, PyTorch, Tensor

RT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues

• Collaborate with the engineering team to bring new features and capabilities to our inference platform

• Develop robust and scalable infrastructure for AI model serving

• Create and maintain technical documentation for inference systems

Requirements

• 3+ years of experience writing high-performance, production-quality code

• Strong proficiency with Python and deep learning frameworks, particularly Py Torch

• Demonstrated experience with LLM inference optimization techniques

Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred

• Familiarity with Docker and Kubernetes for containerized deployments

• Experience with CUDA programming and GPU optimization

• Strong understanding of distributed systems and scalability challenges

Proven track record of optimizing AI models for production environments

Nice to Have

• Familiarity with Tensor

RT and Tensor

RT-LLM

• Knowledge of vision models and multimodal AI systems

• Experience implementing techniques like quantization and speculative decoding

• Contributions to open source machine learning projects

• Experience with large-scale distributed computing

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including:

• Full healthcare coverage

• Quarterly offsites

• Flexible PTO

Skills:

pytorch, gpu optimization, deep learning frameworks, sglang, vllm, cuda programming, machine learning, python, llm

#JLjbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary