×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer; hybrid or remote

Remote / Online - Candidates ideally in
Toronto, Ontario, C6A, Canada
Listing for: Achievers
Full Time, Remote/Work from Home position
Listed on 2026-02-07
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, IT Project Manager, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 124000 CAD Yearly CAD 124000.00 YEAR
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer (hybrid or remote)
Our Site Reliability Engineering team sits at the intersection of software engineering and operations, building reliable, scalable cloud systems that our teams and customers can trust.
As  Staff Site Reliability Engineer  , you'll play a critical role in the management and advancement of our global infrastructure. You'll leverage approximately 15 years of technical expertise - specifically focusing on the evolution of  high-concurrency, distributed systems  , and the orchestration of  hyper-scale cloud environments  . In this position, you will leverage your expertise to architect our  GCP/GKE  environment and lead the integration of  AI-driven workflows  .

This includes utilizing bots, automated PR remediation, and intelligent alerting to ensure our platform can scale efficiently and reliably.
Why you'll love this role:
Lead high-impact initiatives that shape how millions of people experience work around the world.
Bring your unique perspective to complex and challenging projects - apply your expertise in architecture, influence technical direction, and mentor fellow team members.
Join a close-knit, no-ego, high-performing teamthat solves meaningful problems and celebrates successes together.
Work alongside an experienced leadership teamwho is genuinely invested in your career growth.
Thrive in afast-paced, high-growth environmentwhereinnovationis encouraged andyour voice truly matters.
How you’ll shape our cloud infrastructure:
Architectural Leadership:  Lead the design and ongoing evolution of our global, high-availability infrastructure, focusing on  Google Cloud Platform (GCP)  and  Kubernetes (GKE) .
AI & Automation Strategy:  Identify repetitive operational tasks and implement AI-integrated workflows, such as Slack or Teams bots for incident triage, AI-augmented alerting, and automated PR generation to address infrastructure drift.
Cross-Functional Influence:  Collaborate with Product, Engineering, and Leadership teams to identify systemic risks, manage complex changes, and define the long-term reliability roadmap.
Infrastructure-as-Code (IaC):  Establish and exemplify best practices for Terraform and CI/CD pipelines, empowering development teams to deploy code rapidly and securely.
System Resiliency:  Lead high-level initiatives in disaster recovery, multi-region networking, and the design of zero-trust security architectures.
Technical Mentorship:  Guide design reviews and promote best practices, enhancing the technical skills and capabilities of the entire SRE organization.
Experience we feel will set you up for success:
The 15-Year Lens:  Possess extensive systems engineering experience, with in-depth knowledge of Linux kernels, network protocols (TCP/IP, BGP, DNS), and cloud-native architecture.
GCP Expertise:  Demonstrated, hands-on experience in architecting and managing production workloads on  Google Cloud Platform  and  GKE .
AI/Workflow Automation:  Practical experience or a strong vision for integrating AI tools and LLMs to automate SRE tasks, documentation, or incident response.
Code Proficiency:  Advanced skills in  Python or Go , with the ability to develop sophisticated internal tools and automation frameworks.
Observability Mastery:  Expert understanding of observability frameworks (such as New Relic, Prometheus, Grafana) to enable data-driven decision-making.
Database Foundations:  Deep knowledge of managing relational databases (MySQL, Mongo

DB) munication:  Exceptional ability to clearly convey complex technical infrastructure challenges as actionable business insights to non-technical stakeholders.
The Achievers Mindset    Disruptive Innovator:  Set industry trends by applying emerging technologies like AI to address longstanding infrastructure challenges.
Self-Starter:  Maintain a mindset of continuous improvement, always seeking opportunities to automate processes.
Culture of Success:  Believe that platform reliability is fundamental to both employee success and customer trust.
Bonus Points   Hands-on experience with Service Mesh (Istio) and advanced GCP Networking features, such as Interconnect and Shared VPC.
A proven history of migrating legacy automation systems to modern,…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary