SRE, Ads
Listed on 2026-06-22
-
IT/Tech
SRE/Site Reliability
Location: Reddit has a flexible first workforce. Don't live near our office? No worries: you can work remotely from anywhere in the UK, the Netherlands or Ireland. Visit
The Ads organization powers Reddit's advertising platform, enabling advertisers to reach highly engaged communities while helping Reddit grow its business. The reliability of our Ads systems directly impacts advertiser success, revenue generation, and user experience.
The Ads Reliability team partners closely with Ads Engineering teams to improve reliability, scalability, operational excellence, and developer productivity across Reddit's advertising ecosystem.
We're looking for a Staff Site Reliability Engineer who will provide technical leadership for reliability initiatives across the Ads organization and help shape the future of Ads infrastructure at Reddit.
What you’ll do:- Lead reliability initiatives across multiple Ads domains including ad serving, auctions, targeting, reporting, measurement, and billing.
- Partner with engineering leadership to improve reliability, scalability, operational excellence, and engineering efficiency across the Ads organization.
- Drive architecture reviews and influence technical decisions impacting critical revenue-generating systems.
- Design and build platforms, tooling, and automation that improve reliability and developer productivity at scale.
- Participate in on‑call rotations, lead complex incident investigations and coordinate cross‑functional response efforts during major production events.
- Identify systemic reliability risks and drive long‑term solutions that improve platform resilience.
- Establish reliability metrics around advertiser‑critical user journeys such as campaign creation, ad delivery, auction participation, reporting, attribution, and billing.
- Mentor engineers and provide technical leadership across multiple teams.
- Influence roadmap planning and ensure reliability considerations are incorporated into product and infrastructure investments.
- 8+ years of experience in Site Reliability Engineering, Infrastructure Engineering, or related roles operating large scale distributed systems.
- Strong experience supporting high traffic, user facing production environments.
- Deep understanding of distributed systems, networking, Linux systems, cloud native architectures.
- Experience designing highly available systems with strong operational and reliability practices.
- Strong understanding of observability systems including metrics, logging, tracing, and alerting.
- Good programming skills in languages such as Go, Python, or similar.
- Experience improving reliability through SLOs, automation, incident management, and performance optimization.
- Demonstrated ability to troubleshoot complex issues across a modern distributed system stack.
- Strong collaboration and communication skills with the ability to influence technical direction across teams.
- Experience supporting advertising technology platforms or other large
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: