Site Reliability Engineer; SRE – Observability
Job in
Toronto, Ontario, C6A, Canada
Listed on 2026-03-10
Listing for:
Astra-North Infoteck Inc. ~ Conquering today’s challenges, achieving tomorrow’s vision!
Full Time
position Listed on 2026-03-10
Job specializations:
-
IT/Tech
IT Support, Systems Engineer, Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Job Description:
Site Reliability Engineer (SRE) – Observability
Toronto - Hybrid (1-2 days office)
We are looking for a Observability Engineer to help implement, operate, and improve observability capabilities across our applications and platforms. This role focuses on hands‑on onboarding, instrumentation, dashboarding, and alerting, working under established standards and guidance from senior engineers.
You will collaborate with application, SRE, and operations teams to ensure systems are observable, supportable, and production‑ready.
Key Responsibilities Observability Implementation- Implement and maintain metrics, logs, and traces for applications and infrastructure
- Assist with onboarding applications into observability platforms (e.g., Dynatrace, ELK, Datadog)
- Configure dashboards, alerts, and basic anomaly detection
- Work with development teams to enable structured logging, basic distributed tracing, and core metrics
- Validate observability requirements during Production Readiness Reviews (PRR)
- Troubleshoot missing or low‑quality telemetry
- Configure alerts based on golden signals (latency, errors, traffic, saturation)
- Help reduce alert noise by tuning thresholds and alert logic
- Support incident response by gathering logs, metrics, and traces
- Support root cause analysis using observability tools
- Maintain dashboards and documentation used by on‑call and support teams
- Participate in on‑call rotations (as applicable)
- Assist in automating observability onboarding and validation tasks
- Create and maintain reusable dashboards and alert templates
- Follow established observability standards and best practices
- 2–4 years of experience in Observability, or SRE
- Working knowledge of metrics, logs, and basic tracing concepts
- Hands‑on experience with at least one observability platform (Dynatrace, Elastic/ELK, Datadog, New Relic, etc.)
- Basic understanding of SLIs/SLOs and service health indicators
- Experience with cloud platforms or hybrid environments
- Ability to write scripts (Python, Bash, Power Shell) for automation and troubleshooting
- Experience with Open Telemetry or APM agents
- Familiarity with Kubernetes or containerized workloads
- Experience working with incident management tools (Pager Duty, Service Now)
- Exposure to Dynatrace/Kibana ELK or similar cloud‑native monitoring
- Experience in regulated or enterprise environments
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×