Machine Learning Engineer - Decisioning & Optimization Job Seattle area,Washington USA,IT/Tech

Position: Machine Learning Engineer 5 - Decisioning & Optimization
At Netflix, our mission is to entertain the world. Together, we are writing the next episode - pushing the boundaries of storytelling, global fandom and making the unimaginable a reality. We are a dream team obsessed with the uncomfortable excitement of discovering what happens when you merge creativity, intuition and cutting-edge technology. Come be a part of what's next.

We launched a new ad-supported tier in November 2022 and are building an in-house world-class ad tech ecosystem to offer our members more choices in consuming their content. Our new tier allows us to attract new members at a lower price point while also creating a compelling path for advertisers to reach deeply engaged audiences.

## Our Team

The Decisioning & Optimization engineering team owns the systems that determine which ad wins every impression, at what price, and how campaign budgets deliver across all inventory surfaces. Our work spans three platform areas:

* ML infrastructure for model serving: real-time inference at 1M+ QPS, multi-model parallel evaluation, feature hydration, model lifecycle from canary deployment through production monitoring

* Auction, ranking, and scoring: multi-stage candidate selection, scoring, bid valuation, dynamic pricing, and podding

* Budget, pacing, and bidding: control systems for delivery optimization, budget planning, and bid computation

We are scaling from a handful of production models to 10+ while maintaining sub-20ms P99 inference budgets. We are looking for an ML engineer who can build and operate the serving infrastructure these models run on, and who understands the ads decisioning context well enough to make the right engineering tradeoffs.

## What You'll Do

* Build and operate end-to-end ML model serving infrastructure for real-time ad decisioning: model publishing, packaging, validation, deployment into the serving stack with zero-downtime hot-swap

* Scale the inference path to support dozens of concurrent models on every ad request at 1M+ QPS with strict latency budgets, including batching strategies, CPU/GPU allocation, model versioning, and fallback tiers

* Design and optimize the feature serving path: feature hydration from Chronon, Signal Service, and real-time streams with sub-10ms P99 fetch latency and online/offline consistency

* Productionize scoring and ranking models for multi-stage ad selection (retrieval, early ranking, full scoring) and integrate model outputs into auction

* Build model performance monitoring in production: inference latency, prediction distribution shifts, feature drift detection, score calibration, and regression detection before revenue impact

* Partner closely with Data Science & Platform teams

* Build simulation infrastructure to replay production traffic against candidate models offline, enabling validation of marketplace changes before live rollout

* Drive operational excellence for ML systems: reliability, observability, capacity planning, incident response, and scaling for live events with 35M+ concurrent viewers

## Skills & Experience We're Seeking

* 7+ years of software engineering experience; 3+ years focused on ML infrastructure, model serving, or ML platform work in an ads or real-time decisioning context

* Built and operated real-time model serving systems at high QPS with sub-20ms latency: online inference, feature stores, model registries, model hot-swap, canary and shadow rollout

* Proficiency in Java, Python, or Scala with a solid understanding of multi-threading, memory management, and performance optimization for latency-critical paths

* Hands-on with ML serving frameworks: serialization, runtime optimization, and deployment constraints

* Experience with feature engineering pipelines for real-time systems: online/offline consistency, hydration strategies, caching, and freshness tradeoffs

* Strong understanding of model monitoring in production: drift detection, prediction distribution analysis, calibration, and latency profiling

* Comfortable working at the boundary between ML research and production engineering: can take a model artifact and turn it into a production-ready service that meets SLA

* Demonstrated ability to operate in an…