Senior ML Performance Engineer
Listed on 2026-02-17
-
IT/Tech
AI Engineer, Machine Learning/ ML Engineer, Systems Engineer, Data Engineer
Position: Senior ML Performance Engineer
Location: SF Bay Area (US) or Toronto (Canada) – Hybrid
Employment Type: Full-Time
Industry: AI Infrastructure / Compiler SystemsOverview
A venture-backed AI infrastructure company is building a high-performance, portable compiler designed to let developers “build once, deploy anywhere.” This includes cloud, edge, and hybrid environments — all optimized for resource efficiency, scalability, and sustainable AI development.
The team is looking for a Senior ML Performance Engineer to architect and lead a Performance Testing Platform from the ground up, measuring and optimizing the performance of large language models (LLMs) before and after compiler optimization on modern GPU architectures.
This role sits at the intersection of ML systems, GPU architecture, and performance engineering, with high visibility into product quality and customer impact.
Key ResponsibilitiesDesign and implement a comprehensive performance testing platform for LLM inference workloads across GPU clusters
Define benchmarking methodologies, metrics, and test suites (latency, throughput, memory utilization, power consumption, and model accuracy)
Establish baseline performance for unoptimized models and validate post-optimization improvements
Build automated pipelines for continuous performance validation across compiler releases and model updates
Investigate performance bottlenecks using GPU profilers and system-level monitoring
Collaborate with compiler engineers, ML engineers, and Dev Ops to integrate performance testing into development workflows
Create dashboards and reporting to track performance trends, regressions, and wins
Document best practices for GPU-based ML performance testing
7+ years in performance engineering, benchmarking, or systems engineering roles
Strong knowledge of ML inference workloads, particularly transformer-based LLMs
Hands-on GPU programming and optimization experience (CUDA, ROCm, or similar)
Strong programming skills in Python and C/C++
Proven experience building performance testing infrastructure or benchmarking platforms from scratch
Experience with ML frameworks:
PyTorch, Tensor Flow, ONNX Runtime, vLLM, Tensor
RT-LLMProficiency with profiling and debugging GPU workloads
Experience with CI/CD systems and test automation frameworks
Strong analytical skills with the ability to design experiments, analyze results, and communicate findings clearly
AMD GPU experience (Mi200/Mi300) and ROCm ecosystem
Compiler optimization knowledge
Distributed inference and multi-GPU workloads
ML model quantization, pruning, and optimization techniques
High-performance computing or systems-level optimization
Infrastructure-as-code experience:
Kubernetes, Docker, TerraformContributions to open-source ML or systems projects
Detail-oriented — able to spot subtle regressions
Self-driven and accountable
Collaborative and team-oriented
Passionate about sustainable AI
Clear and effective communicator
Competitive salary, dependent on experience and location
Equity and bonus opportunities
Medical, dental, and vision coverage
Retirement savings plan
Additional wellness benefits
Build the infrastructure that validates high-performance ML models
Influence core product quality and customer outcomes
Work in a highly technical, high-impact environment at the forefront of AI systems
Collaborate across a globally distributed team
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).