Software Engineer,Reliability Platform Job San Francisco area,California USA,IT/Tech

San Francisco, CA;
Sunnyvale, CA;
Seattle, WA

About the Team

The Reliability Platforms organization is part of Door Dash’s Production Lifecycle team, which owns the end-to-end experience of how engineers safely change, observe, and operate production systems.

Our mission is to enable teams to confidently make changes to production, understand reliability and service health on demand, and abstract complexity into platforms for common operations with built-in guardrails that make safe, repeatable operations the default.

Reliability Platforms builds platforms as products that touch every production change, with two primary areas of focus:

Self-Serve Infrastructure & Configuration Change Control - building the systems engineers use to provision services, request cloud resources, and safely make config changes across traffic, compute, and secrets
Reliability & Service Health - delivering unified health scores, SLOs, alerting pipelines, and automation that help engineers know what’s happening, improve reliability, and act quickly when something goes wrong

Together, these focus areas form the backbone of how Door Dash engineers safely ship, observe, and remediate production systems.

About the Role

As a Software Engineer on Reliability Platforms, you’ll help design and build the systems that sit at the center of every production change at Door Dash.

You’ll work on the APIs, UIs, and automation that thousands of Door Dash engineers rely on every day to:

Spin up new services and request cloud resources without waiting on another team
Safely roll out configuration changes both service and infra with progressive delivery
See at a glance whether their services are healthy and how they’re performing against SLOs
Diagnose and remediate incidents faster or let the system fix them automatically

And as we look forward, you’ll play a key part in shaping the future of agentic, AI-assisted operations at Door Dash systems that can propose, validate, and even execute production changes autonomously, moving us toward a world where production is proactive and self-healing by default.

Your work will shape the developer experience for production ’ll collaborate with product engineers, other infrastructure engineers, and platform peers to create durable abstractions that make reliability the path of least resistance.

This is a high-leverage role where the platforms you build will be used by every engineer at Door Dash multiplying your impact across the entire company.

You’re excited about this opportunity because you will…

Build Self-Serve Platforms: Design and develop systems in Go that let engineers safely request infra, configure services, and manage production state.
Deliver Safe Change Workflows: Add guardrails, validation, and progressive rollout capabilities for infra and config changes.
Enhance Reliability Feedback: Provide pre-flight checks, posture scoring, and unified health views to catch issues before they reach production.
Automate Recovery: Contribute to systems that remediate incidents automatically or guide engineers through resolution quickly.
Create a Unified

Experience:

Help evolve our UIs and APIs into a single entry point for production change and health insights.
Shape the Future of Operations: Experiment with agentic, AI-assisted workflows that can propose, validate, and safely execute production changes — moving Door Dash toward proactive, self-healing systems.

We’re excited about you because…

Platform Engineering Mindset: You think in terms of APIs, abstractions, and workflows — you enjoy building systems that other engineers depend on every day.
Proven Experience: You have 2+ years of experience in an infrastructure, platform, or backend engineering role, showing you can deliver and maintain complex systems.
Backend Development

Skills:

You’re fluent in Go (or a similar language) and can design scalable, resilient services that are easy to operate.
Cloud & Infra Expertise or even SRE: You’re comfortable with AWS primitives, security best practices, and Infrastructure as Code tools like Terraform or Pulumi and you know when to abstract them away for other engineers. You understand concepts like SLOs, error budgets, and…


Increase/decrease your Search Radius (miles)



Job Posting Language

Software Engineer, Reliability Platform