×
Register Here to Apply for Jobs or Post Jobs. X

ML Software Engineer; L6 — Platform Systems, AIMS Engineering

Job in Los Gatos, Santa Clara County, California, 95032, USA
Listing for: Netflix
Full Time position
Listed on 2026-07-04
Job specializations:
  • Software Development
    AI Engineer (Applied/Software), Machine Learning/ ML Engineer, Cloud Engineer - Software
Job Description & How to Apply Below
Position: Staff ML Software Engineer (L6) — Platform Systems, AIMS Engineering

Staff ML Software Engineer (L6)

Los Gatos, California, United States of America

At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what's next.

About the Job

AI for Member Systems (AIMS) runs the AI systems behind every recommendation, search result, and personalized experience for 300M+ members. The stack powering it is large and battle-tested, built to meet the demands of its time, and remarkably effective at doing so. But AI/ML is moving fast, and the infrastructure that got us here needs to evolve to meet what's next: new model paradigms, tighter cost and efficiency expectations, and the operational maturity that comes with running AI at this scale.

Migrating to a next-generation AI/ML platform is one of the highest-leverage programs in AIMS. So is building the observability and cost infrastructure that makes that platform trustworthy. This role owns that problem end-to-end.

Platform Systems is the engineering foundation of AIMS, owning reliability, scalability, cost efficiency, and developer experience across the org. We are looking for a Staff ML Software Engineer to own the technical health of the AIMS AI/ML stack — modernizing it, and building the observability and cost infrastructure that makes that modernization trustworthy. This is a high-leverage, cross-cutting role — the work you do here will define how AIMS builds AI/ML systems for the next decade.

While the initial migration marks our first major initiative, our ongoing goal is to establish sustainable practices for the long term.

Responsibilities

  • Define the end-state architecture for the modernized AIMS AI/ML stack: how it is organized, what contracts each layer exposes, and what the migration path looks like across training pipelines, AI frameworks, and data infrastructure
  • Drive end-to-end migration of AIMS AI/ML systems onto a modern, Python-native platform, coordinating across multiple AIMS teams and external platform partners, with dozens of production models in flight
  • Build migration tooling and shared abstractions that reduce the cost of adoption for individual teams, so modernization does not require each team to solve the same problems independently
  • Own scalability across training throughput and data pipelines, ensuring AIMS AI/ML systems stay performant as model complexity and member traffic grow
  • Design and build observability systems that give AIMS ML practitioners deep visibility into model behavior, training pipeline health, serving latency, and data quality, making issues detectable and diagnosable before they become incidents
  • Identify and drive cost optimization across AIMS training and serving infrastructure, developing frameworks and tooling that make compute efficiency a first-class concern, not an afterthought
  • Architect reliability improvements across the AIMS AI/ML stack, reducing toil, improving on-call ergonomics, and setting the standard for operational excellence across the org
  • Prototype and product ionize GenAI-powered tooling for anomaly detection, root cause analysis, and operational automation, applying LLM-based systems to the problems of AI/ML reliability and cost at scale
  • Surface systemic cost, reliability, and migration gaps by embedding with AI/ML teams across AIMS, and translate their friction into concrete engineering investments with org-wide leverage
  • Set technical standards for the modernized stack and raise the engineering bar across AIMS through design reviews, architectural guidance, and leading by example
  • Own the long-term architectural evolution of the AIMS AI/ML stack — continuously evaluating emerging infrastructure patterns, model paradigms, and platform capabilities, and translating them into a forward-looking roadmap before they become urgent migrations

What We're Looking For

  • Significant experience designing, building, and operating large-scale production AI/ML systems, including training…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary