Senior Software Engineer - Observability & IRM
Listed on 2026-05-16
-
IT/Tech
IT Support, Cloud Computing
Overview
The Trade Desk is a global technology company with a mission to create a better, more open internet for everyone through principled, intelligent advertising. Handling over 1 trillion queries per day, our platform operates at an unprecedented scale. We value the unique experiences and perspectives that each person brings and foster inclusive spaces where everyone can bring their authentic selves to work every day.
Do you have a passion for solving hard problems at scale? Are you eager to join a dynamic, globally connected team where your contributions will make a meaningful difference in building a better media ecosystem?
About the TeamThe Service Excellence (SE) team owns the tools and infrastructure that help engineers understand and operate production systems. The Incident Response Services (IRS) taskforce focuses on the on-call experience. The team is responsible for making incidents easier to detect, manage, and optimize using historical data points information.
Responsibilities- Incident management tooling
- Build and maintain automation around the incident lifecycle: alerting, escalation, incident channels, retros, and SLA tracking
- Evaluate and migrate our logging stack
- Participate in re-evaluating our logging vendor and collection architecture
- Backstage/Service catalog — Extend our internal developer portal with Kubernetes integrations, maturity models, and SLO adoption tooling
- Alert quality tooling — Build systems that provide engineers better signal and less noise, smarter routing, tighter feedback loops between alerts and owning teams
- Experience building and operating production infrastructure or internal developer tooling
- Comfort working across the stack — touches distributed systems, Kubernetes, observability pipelines, and web-based tooling
- Familiarity with observability concepts: logging, alerting, on-call workflows
- Strong debugging instincts; able to respond when things break
- Clear communication; ability to explain tradeoffs and advocate for solutions
- Experience with Grafana, Prometheus, or similar observability tools
- Familiarity with Sumo Logic or other log management platforms
- Prior work on developer portals or service catalog tooling (Backstage, Ops Level, etc.)
- Experience with Kubernetes at scale
Variety of technical opportunity is a key aspect of working at The Trade Desk. We value quick learning and finding solutions to complex problems using the best tools for the job. We expect engineers who can invent answers to questions not yet asked.
Note: The Trade Desk does not accept unsolicited resumes from search firm recruiters. The Trade Desk is an equal opportunity employer. All employment decisions are based on merit, competence, performance, and business needs. We do not discriminate on any protected characteristic under federal, state, or local law.
In accordance with US state laws, the range provided is a reasonable estimate of base compensation. The actual amount may differ based on experience, knowledge, skills, and location. Eligible employees may receive stock-based compensation grants and other compensation where applicable. Benefits include comprehensive healthcare for employees and dependents, retirement benefits, disability coverage, life insurance, well-being benefits, tuition reimbursement, parental leave, vacation and holidays, and stock purchase plan details.
For more information, please contact our recruiting team for accessibility accommodations.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).