×
Register Here to Apply for Jobs or Post Jobs. X

Sr. Engineering Manager, Tooling and Reliability Platforms

Job in Richardson, Dallas County, Texas, 75080, USA
Listing for: Yahoo Holdings Inc.
Full Time position
Listed on 2026-05-20
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

It takes powerful technology to connect our brands and partners with an audience of hundreds of millions of people. Whether you're looking to write mobile app code, engineer the servers behind our massive ad tech stacks, or develop algorithms to help us process trillions of data points a day, what you do here will have a huge impact on our business-and the world.

A

Little About Us

Our Tooling and Reliability Platforms team operates as a foundational pillar of the Central Technology Organization. We provide the "paved road" for Yahoo's diverse verticals, enabling them to ship world-class products at a global scale. Our mission is to build modern, secure, and highly efficient platforms that power all of Yahoo's brands, with a relentless focus on Engineered Resilience.

A Lot About You

We are looking for a strategic Senior Engineering Manager (M4) to lead our Tooling & Reliability Platforms team. You are a Product Lead for the "paved road" of reliability at Yahoo, managing a large squad of engineers responsible for our incident management ecosystem while evolving these tools into a comprehensive, AI-augmented Reliability Platform.

You are strategic about the north star of Engineered Resilience, owning the roadmap for automated diagnostics and chaos engineering. You foster a culture of high-trust and continuous experimentation, where engineers are empowered to use modern tools to solve complex reliability challenges. You understand that in a modern engineering org, reliability is achieved through a mix of elite software engineering and intelligent automation.

Key Responsibilities
  • Engineering Leadership & Productivity:
    Manage and grow a high-performing team. Identify and implement AI-driven efficiencies in the product lifecycle to accelerate platform delivery and engineering productivity.
  • Product & Workflow Ownership:
    Treat the reliability stack as a product. Define the roadmap for the Incident Management platform, ensuring these tools reduce cognitive load for hundreds of service teams by replacing manual investigation steps with AI-assisted workflows.
  • AIOps & Governance:
    Drive the integration of GenAI and SRE Agents into production environments. Establish frameworks for validating AI-generated incident summaries and hypothesis generation to ensure accuracy and prevent automated hallucinations.
  • Resilience Engineering:
    Define the vision for the next generation of Resilience Engineering, focusing on building services that make products inherently resilient through automated alert diagnostics and self-healing systems.
  • Vendor Advocacy:
    Act as a high-leverage partner to our key vendors, holding them accountable for roadmap delivery and ensuring their features align with our team vision.
Who You Are
  • A Builder & A Leader:
    Experience managing manager-level or senior IC reports in a high-scale environment, with a track record of building internal platforms.
  • Product-Minded:
    You don't just "install" tools; you architect a "paved road" that engineers want to use, focusing on reducing friction through intelligent automation.
  • AI-Forward:
    You possess a commitment to combining SRE with LLMs and have the expertise to convert AI potential into effective, real-world automation and structured prompt interaction with AI tools.
  • Strategic & Adaptive:
    Ability to manage day-to-day operations while pivoting strategy to account for emerging AI-driven reliability trends.
Basic Qualifications
  • Bachelor's degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
  • 5+ years of experience leading SRE or Dev Ops teams in a high-scale, cloud-native environment.
  • Strong background in Software Engineering (Python, Go, or Java) and Infrastructure-as-Code.
  • Deep familiarity with incident management and AIOps tools (e.g., Rootly, Pager Duty, Big Panda).
  • Experience evaluating and refining AI-generated outputs in a technical or operational context.
  • Proven ability to collaborate with SaaS partners to influence a collective product vision.
  • Comfort operating in an evolving AI-augmented environment with a focus on continuous learning.
Preferred Qualifications
  • East coast timezone preference
  • Experience with BCP/DR planning or…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary