×
Register Here to Apply for Jobs or Post Jobs. X

Engineering Manager, SRE

Job in Toronto, Ontario, M5A, Canada
Listing for: Index Exchange
Full Time position
Listed on 2025-12-31
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Job Description & How to Apply Below

At Index Exchange, we’re reinventing how digital advertising works— a global advertising supply-side platform, we empower the world’s leading media owners and marketers to thrive in a programmatic, privacy-first ecosystem.

We’re a proud industry pioneer with over 20 years of experience accelerating the ad technology evolution. Our proprietary tech is trusted by some of the world’s largest brands and media owners and plays a crucial role in keeping the internet open, accessible, and largely free.

We process more than 550 billion real-time auctions every day (in comparison, Google processes 8.5 billion searches per day) with ultra-low latency. Our platform is vertically integrated from servers to networks and runs primarily on our own metal and cloud infrastructure. This end-to-end infrastructure is designed to provide both stability and agility, enabling us to adapt quickly as the market evolves.

At the core of it all is our engineering-first culture. Our engineers tackle internet-scale problems across tight-knit, global teams. From moving petabytes of data and optimizing with AI to making real-time infrastructure decisions, Indexers have the agency and influence to shape the future of advertising. We move fast, build thoughtfully, and stay grounded in our core values.

About The Role

We are seeking an experienced Engineering Manager with a strong background in Site Reliability Engineering (SRE) to lead and develop a high-performance team of engineers. The ideal candidate will have a deep technical understanding of on-premise and hybrid cloud environments and a proven track record of managing SRE teams in a global setting.

Here’s What You’ll be Doing

  • Team Leadership: Build and lead a world-class SRE team, fostering a culture of innovation, collaboration, and accountability. Provide mentorship, guidance, and professional development opportunities to team members.
  • Technical Expertise: Possess a deep understanding of on-premise and hybrid cloud environments, with a focus on optimizing performance low-latency on Kubernetes platforms supporting a robust developer experience framework.
  • Operational Excellence: Drive operational excellence through proactive monitoring, automation, and the development of robust incident management processes. Ensure the team meets and exceeds service level objectives (SLOs) and service level indicators (SLIs).
  • Software Engineering

    Skills:

    Collaborate with software engineering teams to implement SRE best practices in the software development life cycle, including designing scalable and resilient systems.
  • Incident Management: Lead incident response efforts, ensuring rapid resolution and post-incident analysis to prevent recurrence. Maintain incident reports and contribute to continuous improvement.
  • Reporting and Metrics: Develop and maintain meaningful performance metrics and reporting mechanisms to track the health and reliability of our systems. Use data-driven insights to guide decision-making and triaging.
  • Global Scale: Manage SRE operations at global scale, considering regional nuances and ensuring consistent, reliable service delivery across geographies.
  • Project Management: Act as a technical leader on projects, architecting the design of projects to meet the needs of the business outcome, and to align with existing architectural vision. Collaborate with subject matter experts and with a network of peers to ensure on-time quality delivery.
  • Here's What You Need

  • Proven experience (6+ years) in SRE roles, with a focus on low-latency, global-scale environments built on upstream Kubernetes.
  • Strong software engineering skills, including proficiency in programming languages such as Golang, Python, Perl.
  • Excellent understanding of on-premise and hybrid cloud architectures.
  • Exceptional leadership and team-building skills with a track record of developing high-performing teams with at least 3 years of experience in that role.
  • Expertise in incident management, root cause analysis, and post-incident reviews.
  • Strong analytical and problem-solving abilities.
  • Experience with industry-standard SRE tools and technologies within the CNCF portfolio.
  • Excellent communication skills, with the ability to…
  • Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
    To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
     
     
     
    Search for further Jobs Here:
    (Try combinations for better Results! Or enter less keywords for broader Results)
    Location
    Increase/decrease your Search Radius (miles)

    Job Posting Language
    Employment Category
    Education (minimum level)
    Filters
    Education Level
    Experience Level (years)
    Posted in last:
    Salary