Software Engineer,ML Serving Platform Job Seattle Washington USA,Software Development

Staff Software Engineer, ML Serving Platform

San Francisco, CA;
Sunnyvale, CA;
Seattle, WA;
New York, NY

About the Team

Door Dash is building the world’s most reliable on-demand logistics engine. Behind the scenes, our Machine Learning Platform (MLP) powers critical real-time decision-making for millions of orders each day, supporting business‑critical use cases like Ads, Groceries, Logistics, Fraud, and Search.

About the Role

We’re looking for a Staff Software Engineer with deep expertise in ML model serving to drive the next generation of our inference platform. This is a highly technical, hands‑on role: you’ll design and build systems that power real‑time predictions across millions of requests per second, tackling challenges in reliability, efficiency, and cost‑aware scaling. Success in this role requires both technical mastery and the ability to lead through collaboration.

You’ll collaborate with core infrastructure teams (compute, storage, networking, dev platform) and with applied ML teams across Ads, Fraud, Logistics, Search, and more who depend on our platform to bring their models to production. You’ll also tap into the best of open‑source frameworks and vendor solutions – contributing back where it makes sense – to accelerate innovation. As Staff Software Engineer, you’ll pair deep technical execution with influence on the roadmap, ensuring our serving systems scale reliably as model architectures and business needs evolve.

In this role, you will:

• Scale richer models at low latency – Design serving systems that handle large, complex models while balancing cost, throughput, and strict latency SLOs.

• Bring modern inference optimizations into production – Operationalize advances from the ML serving ecosystem (e.g. efficient caching, attention optimizations, batching, quantization) to deliver better user experience, latency, and cost efficiency across our fleet.

• Enable platform‑wide impact – Build abstractions and primitives that let serving improvements apply broadly across many workloads, rather than point solutions for individual models.

• Leverage and contribute to OSS – Apply the best of the open‑source serving ecosystem and vendor solutions, and contribute improvements back where it helps the community.

• Drive cost & reliability – Design autoscaling and scheduling across heterogeneous hardware (GPU/TPU/CPU), with strong isolation, observability, and tail‑latency control.

• Collaborate broadly – Partner with ML engineers, infra teams, external vendors, and open‑source communities to ensure our serving stack evolves with the needs of the business.

• Raise the engineering bar – Establish metrics & processes that improve developer velocity, system reliability, and long‑term maintainability.

We’re excited about you because…

• Have 8+ years of engineering experience, including building or operating large‑scale, high‑QPS ML serving systems.

• Bring deep familiarity with ML inference and serving ecosystems.

• Know how to leverage and extend open‑source frameworks and evaluate vendor solutions pragmatically.

• Balance hands‑on execution with long‑term platform thinking, making sound trade‑offs.

• Care deeply about reliability, performance, observability, and security in production systems.

• Lead by example – collaborating effectively, mentoring peers, and setting a high bar for craftsmanship.

Nice To Haves

• GPU serving expertise

– Experience with frameworks like NVIDIA Triton, Tensor

RT‑LLM, ONNX Runtime, or vLLM, including hands‑on use of KV caching, batching, and memory‑efficient inference.

• Familiarity with deep learning frameworks (PyTorch, Tensor Flow) and large language models (LLMs) such as GPT‑OSS or BERT.

• Hands‑on experience with Kubernetes/EKS, microservice architectures, and large‑scale orchestration for inference workloads.

• Cloud experience (AWS, GCP, Azure) with a focus on scaling strategies, observability, and cost optimization.

• Prior contributions to OSS serving ecosystems (e.g., vLLM, Triton plugins, KServe) or active participation in developer communities.

Notice to Applicants for Jobs Located in NYC or Remote Jobs Associated With Office in NYC Only

We use Covey as part of our…


Increase/decrease your Search Radius (miles)



Job Posting Language