More jobs:
Principal Engineer - Perf and Benchmarking
Job in
Sunnyvale, Santa Clara County, California, 94087, USA
Listed on 2026-06-03
Listing for:
CoreWeave
Full Time
position Listed on 2026-06-03
Job specializations:
-
IT/Tech
Data Engineering, Systems Engineer
Job Description & How to Apply Below
Core Weave is the essential cloud platform for AI. Built for pioneers by pioneers, Core Weave delivers technology, tools, and teams that enable innovators to build and scale AI with confidence.
About this roleWe are looking for a Principal Engineer to lead the Benchmarking & Performance team. You will manage a planet‑scale performance data warehouse and help achieve industry‑leading end‑to‑end performance benchmarking publications.
What you’ll do- Strategy & Leadership – Define the multi‑year benchmarking strategy and roadmap; prioritize models/workloads (LLMs, diffusion, vision, speech) and hardware tiers. Build, lead, and mentor a high‑performing team of performance engineers and data analysts. Establish governance for claims: documented methodologies, versioning, reproducibility, and audit trails.
- Perf Ownership – Lead end‑to‑end MLPerf Inference and Training submissions: workload selection, cluster planning, runbooks, audits, and result publication. Coordinate optimization tracks with NVIDIA (CUDA, cuDNN, Tensor
RT/Tensor
RT‑LLM, Triton, NCCL) to hit competitive results; drive upstream fixes where needed. - Internal Latency & Throughput Benchmarks – Design a Kubernetes‑native, repeatable benchmarking service that exercises Core Weave stacks across SUNK, Kueue, and Kubeflow pipelines. Measure and report latency, jitter, tokens/s, time‑to‑first‑token, cold‑start/warm‑start, and cost‑per‑token across models, precis ions, batch sizes, and GPU types. Maintain a corpus of representative scenarios and data sets; automate comparisons across software releases and hardware generations.
- Tooling & Automation – Build CI/CD pipelines and K8s controllers/operators to schedule benchmarks at scale; integrate with observability stacks (Prometheus, Grafana, Open Telemetry) and results warehouses. Implement supply‑chain integrity for benchmark artifacts (SBOMs, Cosign signatures).
- Cross‑functional & Community – Partner with NVIDIA, key ISVs, and OSS projects to co‑develop optimizations and upstream improvements. Support Sales/SEs with authoritative numbers for RFPs and competitive evaluations; brief analysts and press with rigorous, defensible data.
- 10+ years building distributed systems or HPC/cloud services, with deep expertise on large‑scale ML training or similar high‑performance workloads.
- Proven track record of architecting or building planet‑scale data systems (e.g., telemetry platforms, observability stacks, cloud data warehouses, large‑scale OLAP engines).
- Deep understanding of GPU performance (CUDA, NCCL, RDMA, NVLink/PCIe, memory bandwidth), model‑server stacks (Triton, vLLM, Tensor
RT‑LLM, Torch Serve), and distributed training frameworks (PyTorch FSDP/Deep Speed/Megatron‑LM). - Proficient with Kubernetes and ML control planes; familiarity with SUNK, Kueue, and Kubeflow in production environments.
- Excellent communicator able to interface with executives, customers, auditors, and OSS communities.
- Experience with time‑series databases, log‑structured merge trees (LSM), or custom storage engine development.
- Experience running MLPerf submissions or equivalent audited benchmarks at scale.
- Contributions to MLPerf, Triton, vLLM, PyTorch, KServe, or similar OSS projects.
- Experience benchmarking multi‑region fleets and large clusters (thousands of GPUs).
- Publications/talks on ML performance, latency engineering, or large‑scale benchmarking methodology.
Base salary range: $206,000 – $333,000. The starting salary will be determined based on job‑related knowledge, skills, experience, and market location. In addition to base salary, compensation includes a discretionary bonus, equity awards, and a comprehensive benefits program.
Benefits- Medical, dental, and vision insurance – 100% paid by Core Weave
- Company‑paid Life Insurance
- Voluntary supplemental life insurance
- Short and long‑term disability insurance
- Flexible Spending Account
- Health Savings Account
- Tuition Reimbursement
- Employee Stock Purchase Program (ESPP)
- Mental Wellness Benefits through Spring Health
- Family‑forming support provided by Carrot
- Paid Parental Leave
- Flexible, full‑service childcare support with Kinside
- 401(k) with a generous…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×