Site Reliability Engineer; SRE – Observability

Job in Toronto, Ontario, C6A, Canada

Listing for: Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!

Full Time position
Listed on 2026-03-10

Job specializations:

IT/Tech
IT Support, Systems Engineer, Cloud Computing, SRE/Site Reliability

Salary/Wage Range or Industry Benchmark: 100000 - 125000 CAD Yearly CAD 100000.00 125000.00 YEAR

Position: Site Reliability Engineer (SRE) – Observability

Job Description:

Site Reliability Engineer (SRE) – Observability

Toronto - Hybrid (1-2 days office)

We are looking for a Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms. This role focuses on hands‑on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers.

You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production‑ready.

Key Responsibilities Observability Implementation

Implement and maintain metrics, logs, and traces for applications and infrastructure
Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog)
Configure dashboards, alerts, and basic anomaly detection

Application Support & Instrumentation

Work with development teams to enable structured logging, basic distributed tracing, and core metrics
Validate observability requirements during Production Readiness Reviews (PRR)
Troubleshoot missing or low‑quality telemetry

Monitoring & Alerting

Configure alerts based on golden signals (latency, errors, traffic, saturation)
Help reduce alert noise by tuning thresholds and alert logic
Support incident response by gathering logs, metrics, and traces

Operations & Reliability

Support root cause analysis using observability tools
Maintain dashboards and documentation used by on‑call and support teams
Participate in on‑call rotations (as applicable)

Automation & Continuous Improvement

Assist in automating observability onboarding and validation tasks
Create and maintain reusable dashboards and alert templates
Follow established observability standards and best practices

Required Qualifications

2–4 years of experience in Observability, or SRE
Working knowledge of metrics, logs, and basic tracing concepts
Hands‑on experience with at least one observability platform (Dynatrace, Elastic/ELK, Datadog, New Relic, etc.)
Basic understanding of SLIs/SLOs and service health indicators
Experience with cloud platforms or hybrid environments
Ability to write scripts (Python, Bash, Power Shell) for automation and troubleshooting

Preferred Qualifications

Experience with Open Telemetry or APM agents
Familiarity with Kubernetes or containerized workloads
Experience working with incident management tools (Pager Duty, Service Now)
Exposure to Dynatrace/Kibana ELK or similar cloud‑native monitoring
Experience in regulated or enterprise environments

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language