Machine Learning Engineer Job San Francisco area,California USA,Software Development

Overview

We are building a distributed LLM inference network that combines idle GPU capacity from around the world into a single cohesive plane of compute for running large-language models like Deep Seek and Llama 4. At any given moment, we have over 5,000 GPUs and hundreds of terabytes of VRAM connected to the network. We are a small, well-funded team working on difficult, high-impact problems at the intersection of AI and distributed systems.

We primarily work in-person from our office in downtown San Francisco.

Responsibilities

• Design and implement optimization techniques to increase model throughput and reduce latency across our suite of models

• Deploy and maintain large language models at scale in production environments

• Deploy new models as they are released by frontier labs

• Implement techniques like quantization, speculative decoding, and KV cache reuse

• Contribute regularly to open source projects such as SGLang and vLLM

• Deep dive into underlying codebases of Tensor

RT, PyTorch, Tensor

RT-LLM, vLLM, SGLang, CUDA, and other libraries to debug ML performance issues

• Collaborate with the engineering team to bring new features and capabilities to our inference platform

• Develop robust and scalable infrastructure for AI model serving

• Create and maintain technical documentation for inference systems

Requirements

• 3+ years of experience writing high-performance, production-quality code

• Strong proficiency with Python and deep learning frameworks, particularly Py Torch

• Demonstrated experience with LLM inference optimization techniques

• Hands-on experience with SGLang and vLLM, with contributions to these projects strongly preferred

• Familiarity with Docker and Kubernetes for containerized deployments

• Experience with CUDA programming and GPU optimization

• Strong understanding of distributed systems and scalability challenges

• Proven track record of optimizing AI models for production environments

Nice to Have

• Familiarity with Tensor

RT and Tensor

RT-LLM

• Knowledge of vision models and multimodal AI systems

• Experience implementing techniques like quantization and speculative decoding

• Contributions to open source machine learning projects

• Experience with large-scale distributed computing

Compensation

We offer competitive compensation, equity in a high-growth startup, and comprehensive benefits. The base salary range for this role is $180,000 - $250,000, plus competitive equity and benefits including:

• Full healthcare coverage

• Quarterly offsites

• Flexible PTO

Skills:

pytorch, gpu optimization, deep learning frameworks, sglang, vllm, cuda programming, machine learning, python, llm

#JLjbffr


Increase/decrease your Search Radius (miles)



Job Posting Language