×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer; SRE – Observability

Job in Toronto, Ontario, C6A, Canada
Listing for: Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!
Full Time position
Listed on 2026-03-10
Job specializations:
  • IT/Tech
    IT Support, Systems Engineer, Cloud Computing, SRE/Site Reliability
Salary/Wage Range or Industry Benchmark: 100000 - 125000 CAD Yearly CAD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Site Reliability Engineer (SRE) – Observability

Job Description:

Site Reliability Engineer (SRE) – Observability

Toronto - Hybrid (1-2 days office)

We are looking for a Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms. This role focuses on hands‑on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers.

You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production‑ready.

Key Responsibilities Observability Implementation
  • Implement and maintain metrics, logs, and traces for applications and infrastructure
  • Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog)
  • Configure dashboards, alerts, and basic anomaly detection
Application Support & Instrumentation
  • Work with development teams to enable structured logging, basic distributed tracing, and core metrics
  • Validate observability requirements during Production Readiness Reviews (PRR)
  • Troubleshoot missing or low‑quality telemetry
Monitoring & Alerting
  • Configure alerts based on golden signals (latency, errors, traffic, saturation)
  • Help reduce alert noise by tuning thresholds and alert logic
  • Support incident response by gathering logs, metrics, and traces
Operations & Reliability
  • Support root cause analysis using observability tools
  • Maintain dashboards and documentation used by on‑call and support teams
  • Participate in on‑call rotations (as applicable)
Automation & Continuous Improvement
  • Assist in automating observability onboarding and validation tasks
  • Create and maintain reusable dashboards and alert templates
  • Follow established observability standards and best practices
Required Qualifications
  • 2–4 years of experience in Observability, or SRE
  • Working knowledge of metrics, logs, and basic tracing concepts
  • Hands‑on experience with at least one observability platform (Dynatrace, Elastic/ELK, Datadog, New Relic, etc.)
  • Basic understanding of SLIs/SLOs and service health indicators
  • Experience with cloud platforms or hybrid environments
  • Ability to write scripts (Python, Bash, Power Shell) for automation and troubleshooting
Preferred Qualifications
  • Experience with Open Telemetry or APM agents
  • Familiarity with Kubernetes or containerized workloads
  • Experience working with incident management tools (Pager Duty, Service Now)
  • Exposure to Dynatrace/Kibana ELK or similar cloud‑native monitoring
  • Experience in regulated or enterprise environments
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary