×
Register Here to Apply for Jobs or Post Jobs. X

Member of Technical Staff - ML Infrastructure & Performance

Job in San Mateo, San Mateo County, California, 94409, USA
Listing for: Embedding VC
Full Time position
Listed on 2026-06-03
Job specializations:
  • IT/Tech
    AI Engineer (Applied/Software), Systems Engineer, Data Engineering, Data Scientist
Job Description & How to Apply Below
Introducing Moonlake, AI for creating real-time interactive content

Mission: Improve Throughput, Latency, & Cost - deploying our models 2-10× faster & cheaper without quality regressions.

Scope of Work:

- GPU performance: CUDA/Triton kernels, Flash Attention family, paged attention, CUDA Graphs.

- Serving stack:
Tensor

RT-LLM/Triton Inference Server, vLLM/TGI; continuous batching; on-GPU KV reuse; speculative decoding/medusa; mixture-of-agents routing.

- Parallelism: FSDP/ZeRO, TP/PP/expert parallel; NCCL tuning.

- Quantization/PEFT: AWQ/GPTQ/FP8;
LoRA/DoRA serving.

- Systems:
Ray/k8s/Argo, observability (Prom/Grafana/Open Telemetry), autoscaling, A/B infra, canary + rollback.

Tech signals:

Previous experience at Infra-heavy startups such as Databricks, Roblox

We are committed to being an on-site, in-person team currently based in San Mateo
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary