Member of Technical Staff - ML Infrastructure & Performance

Job in San Mateo, San Mateo County, California, 94409, USA

Listing for: Embedding VC

Full Time position
Listed on 2026-01-05

Job specializations:

IT/Tech
Systems Engineer, AI Engineer

Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR

Member of Technical Staff - ML Infrastructure & Performance

Join Embedding VC as a Member of Technical Staff focused on ML infrastructure and performance. This role is for driving improvements in throughput, latency, and cost, enabling deployments of models 2–10× faster and cheaper without compromising quality.

Scope of Work

GPU performance: CUDA/Triton kernels, Flash Attention family, paged attention, CUDA Graphs.
Serving stack:
Tensor

RT-LLM, Triton Inference Server, vLLM/TGI; continuous batching; on‑GPU KV reuse; speculative decoding/Medusa; mixture‑of‑agents routing.
Parallelism: FSDP/ZeRO, TP/PP/expert parallel, and NCCL tuning.
Quantization & PEFT: AWQ, GPTQ, FP8;
LoRA/DoRA serving.
Systems:
Ray, Kubernetes, Argo; observability via Prometheus/Grafana/Open Telemetry; autoscaling, A/B infra; canary & rollback.

Tech Signals / Qualifications

Previous experience at infrastructure‑heavy startups such as Databricks or Roblox.
Strong background in GPU programming, model serving, and distributed training.

Location:

San Mateo, CA. The team is committed to an on‑site, in‑person work model.

#J-18808-Ljbffr

To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
View / Apply for Jobs
Matching My Jurisdiction


Increase/decrease your Search Radius (miles)



Job Posting Language