Senior ML Performance Engineer Job San Francisco area,California USA,IT/Tech

Position: Senior ML Performance Engineer
Location: SF Bay Area (US) or Toronto (Canada) – Hybrid
Employment Type: Full-Time
Industry: AI Infrastructure / Compiler SystemsOverview

A venture-backed AI infrastructure company is building a high-performance, portable compiler designed to let developers “build once, deploy anywhere.” This includes cloud, edge, and hybrid environments — all optimized for resource efficiency, scalability, and sustainable AI development.

The team is looking for a Senior ML Performance Engineer to architect and lead a Performance Testing Platform from the ground up, measuring and optimizing the performance of large language models (LLMs) before and after compiler optimization on modern GPU architectures.

This role sits at the intersection of ML systems, GPU architecture, and performance engineering, with high visibility into product quality and customer impact.

Key Responsibilities

Design and implement a comprehensive performance testing platform for LLM inference workloads across GPU clusters
Define benchmarking methodologies, metrics, and test suites (latency, throughput, memory utilization, power consumption, and model accuracy)
Establish baseline performance for unoptimized models and validate post-optimization improvements
Build automated pipelines for continuous performance validation across compiler releases and model updates
Investigate performance bottlenecks using GPU profilers and system-level monitoring
Collaborate with compiler engineers, ML engineers, and Dev Ops to integrate performance testing into development workflows
Create dashboards and reporting to track performance trends, regressions, and wins
Document best practices for GPU-based ML performance testing

Required Qualifications

7+ years in performance engineering, benchmarking, or systems engineering roles
Strong knowledge of ML inference workloads, particularly transformer-based LLMs
Hands-on GPU programming and optimization experience (CUDA, ROCm, or similar)
Strong programming skills in Python and C/C++
Proven experience building performance testing infrastructure or benchmarking platforms from scratch
Experience with ML frameworks:
PyTorch, Tensor Flow, ONNX Runtime, vLLM, Tensor

RT-LLM
Proficiency with profiling and debugging GPU workloads
Experience with CI/CD systems and test automation frameworks
Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly

Nice to Have

AMD GPU experience (Mi200/Mi300) and ROCm ecosystem
Compiler optimization knowledge
Distributed inference and multi-GPU workloads
ML model quantization, pruning, and optimization techniques
High-performance computing or systems-level optimization
Infrastructure-as-code experience:
Kubernetes, Docker, Terraform
Contributions to open-source ML or systems projects

Personal Attributes

Detail-oriented — able to spot subtle regressions
Self-driven and accountable
Collaborative and team-oriented
Passionate about sustainable AI
Clear and effective communicator

Compensation & Benefits

Competitive salary, dependent on experience and location
Equity and bonus opportunities
Medical, dental, and vision coverage
Retirement savings plan
Additional wellness benefits

Why This Role Is Unique

Build the infrastructure that validates high-performance ML models
Influence core product quality and customer outcomes
Work in a highly technical, high-impact environment at the forefront of AI systems
Collaborate across a globally distributed team

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language