More jobs:
Member of Technical Staff - ML Infrastructure & Performance
Job in
San Mateo, San Mateo County, California, 94409, USA
Listed on 2026-01-05
Listing for:
Embedding VC
Full Time
position Listed on 2026-01-05
Job specializations:
-
IT/Tech
Systems Engineer, AI Engineer
Job Description & How to Apply Below
Member of Technical Staff - ML Infrastructure & Performance
Join Embedding VC as a Member of Technical Staff focused on ML infrastructure and performance. This role is for driving improvements in throughput, latency, and cost, enabling deployments of models 2–10× faster and cheaper without compromising quality.
Scope of Work- GPU performance: CUDA/Triton kernels, Flash Attention family, paged attention, CUDA Graphs.
- Serving stack:
Tensor
RT-LLM, Triton Inference Server, vLLM/TGI; continuous batching; on‑GPU KV reuse; speculative decoding/Medusa; mixture‑of‑agents routing. - Parallelism: FSDP/ZeRO, TP/PP/expert parallel, and NCCL tuning.
- Quantization & PEFT: AWQ, GPTQ, FP8;
LoRA/DoRA serving. - Systems:
Ray, Kubernetes, Argo; observability via Prometheus/Grafana/Open Telemetry; autoscaling, A/B infra; canary & rollback.
- Previous experience at infrastructure‑heavy startups such as Databricks or Roblox.
- Strong background in GPU programming, model serving, and distributed training.
Location:
San Mateo, CA. The team is committed to an on‑site, in‑person work model.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×