Lead ML Inference Engineer; Advertising
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-06-17
Listing for:
Roku
Full Time
position Listed on 2026-06-17
Job specializations:
-
IT/Tech
Machine Learning/ ML Engineer, AI Engineer (Applied/Software)
Job Description & How to Apply Below
Requirements
- We’re looking for a strong technical leader with deep experience in ML serving, high-performance computing, and industry standard frameworks - someone excited to mentor engineers, innovate at scale, and shape the future of machine learning at Roku
- M.S. or above in CS, ECE, or a related field
- 10+ years of experience in developing and deploying large-scale, distributed systems, with at least 5 years in a leadership or technical lead role
- Strong programming skills in high-performance languages
- Deep understanding of inference frameworks and ML system deployment
- Proven experience optimizing performance for large-scale machine learning systems, including a deep knowledge of SOTA model optimizations, hardware-software co-design, GPU acceleration, and HPC techniques
- Excellent communication and collaboration skills
- Experience leading teams working on high-throughput, low-latency ML serving systems
- Experience collaborating with and leading global, cross-functional teams
- Contributions to open-source ML or systems projects
- The Advertising Performance group focuses on performance for all participants in the Advertising ecosystem - Advertisers, Publishers, and Roku
- The systems and solutions span multiple disciplines and technologies to perform real-time multi-objective optimization across distributed systems at large scale and with low latency. We use Machine Learning, Reinforcement Learning, AI, Control and Optimization Systems, and Auction Dynamics to solve a large set of complex problems
- At the core of this is our Machine Learning and Inference Platform that powers the entire landscape
- In this role, you will architect, design, and lead the development of a SOTA Inference platform that can handle Advertising-level low latencies, scale, throughput, and availability with optimizations that span across hardware, software, and model
- Lead the design and development of a SOTA Inference platform
- Oversee the development of monitoring, observability, and other tooling to ensure system and model performance, reliability, and scalability of online inference services
- Identify and resolve system inefficiencies, performance bottlenecks, and reliability issues, ensuring optimized end-to-end performance
- Stay at the forefront of advancements in inference frameworks, ML hardware acceleration, and distributed systems, and incorporate innovations where and when they are impactful
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×