×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Ottawa, Ontario, Canada
Listing for: E-IT
Full Time position
Listed on 2026-06-13
Job specializations:
  • IT/Tech
    Systems Engineer, Cybersecurity, IT Support, Cloud Computing
Salary/Wage Range or Industry Benchmark: 100000 - 125000 CAD Yearly CAD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Site Reliability Engineer )

Key Responsibilities

  • Incident Management and Reliability: Lead the incident management process, ensuring high availability and performance of the applications. Develop and implement SRE practices to improve system reliability and resilience.
  • Monitoring and Observability: Utilize Dynatrace, Splunk, and Grafana to monitor system health, detect anomalies, and provide actionable insights for performance optimization.
  • Root Cause Analysis: Conduct thorough root cause analysis of incidents and outages, developing long-term solutions to prevent recurrence.
  • Dev Ops Practices: Collaborate with development and operations teams to streamline CI/CD pipelines, automate workflows, and implement infrastructure as code (IaC) for efficient service deployment and management.
  • Networking Expertise: Provide expertise in networking technologies (Cisco, Arista, AVI, etc.), ensuring robust network infrastructure design, implementation, and troubleshooting. Utilize tools like Wireshark for in-depth network analysis and debugging.
  • Collaboration and Leadership: Work closely with cross-functional teams to share knowledge, mentor junior engineers, and lead by example in adopting best practices in SRE, Dev Ops, and networking.
  • Innovation and Continuous Improvement: Stay abreast of industry trends and new technologies, advocating for and implementing innovative solutions to enhance system reliability and performance.
Qualifications
  • Bachelor’s or Master’s degree in Computer Science, Information Technology, or related field.
  • 10+ years of experience in an SRE/Dev Ops role, with a proven track record in managing high-availability systems.
  • Solid expertise in monitoring and observability tools (Dynatrace, Splunk, Grafana).
  • Proficient in network debugging and analysis tools, including Wireshark.
  • Solid understanding of on-prem and hybrid cloud infrastructure (VMware, Linux, Windows, Azure) and container orchestration (Kubernetes, Docker).
  • Certifications in relevant technologies (Dynatrace, Splunk) are a plus.
  • Excellent communication and leadership skills, capable of leading incident response initiatives and collaborating effectively across teams.
  • Excellent problem-solving skills, with the ability to conduct comprehensive root cause analysis and troubleshooting.
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary