×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Toronto, Ontario, C6A, Canada
Listing for: Artech LLC
Full Time position
Listed on 2026-06-04
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing, Systems Engineer
Job Description & How to Apply Below

Title:

Site Reliability Engineer

Location:

Toronto, Ontario

Duration: 12 months

Pay range: C49 INC

Years of

Experience:

6-8

We are seeking a Site Reliability Engineer (SRE) to ensure the reliability, scalability, and performance of platform services. The ideal candidate will bring strong expertise in SRE practices, observability, infrastructure automation, and developer platform enablement, with exposure to modern technologies including policy-as-code and emerging GenAI-driven systems.

Key Responsibilities
  • Implement and manage SRE practices including:
    • Incident management, root cause analysis, and postmortems
    • Reliability engineering and performance optimization
    • Tracking and improving DORA metrics
  • Define and monitor Service Level Indicators (SLIs) and Service Level Objectives (SLOs)
  • Build and manage monitoring, logging, and distributed tracing frameworks
  • Ensure platform reliability through proactive alerting, observability, and automation
  • Automate infrastructure and governance using:
    • Terraform (Infrastructure as Code)
    • Policy-as-Code tools (OPA/Rego, Sentinel)
  • Enhance developer experience and productivity by:
    • Designing self-service platform capabilities
    • Managing service catalogs and platform standards
    • Building reusable templates and golden paths
    • Work with tools like Backstage to enable internal developer platforms
    • Collaborate with engineering teams to improve system stability, deployment reliability, and operational efficiency
    • Support integration and reliability considerations for GenAI-based systems (RAG, prompt workflows, model evaluation)
Required Skills
  • Strong experience in SRE practices and reliability engineering
  • Hands‑on expertise with monitoring/logging platforms and distributed tracing
  • Experience with SLO/SLI frameworks and observability design
  • Experience in incident management and performance engineering
  • Strong understanding of DORA metrics and operational excellence
  • Proficiency in Terraform (Infrastructure as Code)
  • Policy as Code (OPA/Rego, Sentinel)
  • Experience with developer platform tools (Backstage, service catalogs)
  • Golden paths and platform standardization
Nice to Have
  • Exposure to GenAI platforms, RAG, and prompt engineering concepts
  • Experience in developer productivity measurement and platform engineering initiatives
Tools & Methodologies
  • Experience with Agile methodologies (Jira, Confluence)
  • Familiarity with Dev Ops and platform engineering practices
Soft Skills
  • Strong problem‑solving and analytical skills
  • Ability to work in high‑pressure production environments
  • Excellent communication and cross‑team collaboration
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary