Overview
As a Senior Platform Engineer, you will sit at the intersection of infrastructure reliability and developer experience. You will not only maintain the scalability of our Kubernetes based cloud environment but also work collaboratively on the shift toward a self‑service platform model. In this role, you will apply a software engineering mindset to infrastructure, leveraging Agentic AI to automate complex workflows and building the "Golden Paths" that eliminate manual bottlenecks.
You aren’t just managing clusters; you are architecting the software that manages them, ensuring our systems are resilient, cost‑optimized, and inherently self‑healing. This position will be hybrid out of our Toronto office.
- IDP Architecture & Development:
Lead the design and implementation of an Internal Developer Platform that abstracts infrastructure complexity, providing developers with self‑service capabilities for environment scaling and deployment. - Collaborative Infrastructure:
Work with the team to maintain and scale multi‑region Azure/AKS environments using Terraform and ArgoCD. - Reliability Partnership:
Collaborate with application engineers to implement deep observability via Datadog and establish meaningful reliability targets (SLOs). - AI‑Driven Automation:
Build Python‑based AI agents and tools that reduce team "toil" and streamline common operational tasks. - Incident Response:
Participate in on‑call rotations and lead collective efforts in rootcause analysis (RCA), ensuring the team learns and improves from every incident. - Full‑Lifecycle Ownership:
Participate in on‑call rotations and lead collective Root Cause Analysis (RCA), turning every incident into a platform improvement or automated fix. - Mentorship & Standards:
Contribute to team code reviews, document best practices, and help establish standard patterns for deployment and security.
- Bachelor’s/Master’s degree in computer science, a related technical field, or equivalent practical experience.
- 5+ years of professional Software Development experience with a deep understanding of the full SDLC, system design, and clean code principles.
- 7+ years of experience in Dev Ops, or Systems Engineering, with a focus on running large‑scale production environments.
- Proficiency in Python or Go for automation, tool building, and AI agent integration.
- Experience managing Kubernetes (AKS) and Infrastructure as Code (Terraform/Terragrunt) in a team‑based workplace.
- Proven track record of building or contributing to an Internal Developer Platform (IDP) (e.g., Backstage, internal CLI tools, or custom portals).
- Experience collaborating with application teams to define SLIs/SLOs and improve service reliability.
- Demonstrated ability to mentor junior engineers and lead team‑wide technical initiatives.
- Experience implementing Git Ops (ArgoCD) and Policy‑as‑Code (Kyverno) to standardize team workflows.
- Familiarity with building Agentic AI tools to automate repetitive operational tasks (toil).
- Expertise in Datadog for observability, dashboarding, and incident response.
The annual base salary range for this position is $115,000 - $140,000. Additionally, this position is eligible for an annual discretionary bonus based on performance.
You will also be eligible for the following benefits:
- Paid time off
- Comprehensive benefits plan
- Company RRSP match
- Development opportunities through the Linked In Learning platform
We are committed to upholding policies that contribute to an equitable and welcoming setting for our community, regardless of background, identity, or experience.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: