Cloud Site Reliability Engineer; Waterloo Job London area,Southwestern Ontario Ontario Canada,IT/Tech

Position: Cloud Site Reliability Engineer (Waterloo)
Location: Southwestern Ontario

Overview

The Site Reliability Engineer (SRE) will apply deep expertise in Dev Ops practices, automation, infrastructure orchestration, configuration management, and continuous integration to support the delivery and operation of mission‑critical applications. This role will focus primarily on the development, deployment, and reliability of the xGPlatform and its associated peripheral services, and will play a key role in advancing Imagine Communications toward a robust, multitenant, multi‑cloud product strategy.

Responsibilities

Design, build, deploy, and operate applications and infrastructure across AWS, Azure, and other cloud service providers as required.
Manage and maintain development, staging, and production environments using infrastructure‑as‑code and automation best practices.
Design and implement systems and tooling that improve the reliability, scalability, security, and supportability of Imagine's Managed Services offerings.
Promote Dev Ops and cloud best practices within the team to improve quality, reduce operational risk, increase security, drive efficiency and reuse, and optimize costs.
Collaborate with product, architecture, and business stakeholders to understand user needs and translate them into reliable, scalable technical solutions.
Integrate and orchestrate diverse cloud services and internal systems using Web APIs and event‑driven architectures.
Architect, document, and review system designs with a strong focus on security, resiliency, and operational excellence.
Build and integrate cloud‑based services and automation to improve workforce productivity and reduce manual operational effort.
Partner with architecture and development teams to design reusable deployment patterns and establish governance and observability models.
Apply cloud compliance, security, and reliability standards to application and platform design.
Lead the investigation, troubleshooting, and resolution of Tier‑3 production incidents and escalations, contributing to root cause analysis and continuous improvement.

Qualifications

Bachelor's degree in Computer Science, Engineering or a related technical field.
6+ years of skilled experience in Site Reliability Engineering, Dev Ops, Cloud Engineering, or Software Development roles supporting production systems.
Strong understanding of cloud architecture principles, including scalability, resiliency, high availability, security, and cost optimization.
Hands‑on experience designing, deploying, and operating applications and infrastructure in AWS and/or Azure.
Proficiency with infrastructure‑as‑code and cloud‑native technologies (e.g., Terraform, Ansible, Docker, Kubernetes, Prometheus, messaging or caching systems).
Extensive experience with monitoring, logging, and observability tools and practices.
Proven ability to troubleshoot and resolve complex production issues, including ownership of Tier‑3 incidents and root‑cause analysis.
Experience integrating systems using Web APIs, messaging, or event‑driven architectures.
Working knowledge of SQL and No

SQL databases, including schema design, querying, and operational considerations.
Experience working in Agile and Dev Ops environments.
Strong communication and collaboration skills, with the ability to work effectively across engineering, architecture, and business teams.

Preferred Qualifications

Experience operating and supporting mission‑critical, customer‑facing, or managed service platforms.
Experience leading or contributing to incident response, post‑incident reviews, and reliability improvements.
Familiarity with SRE practices such as service health indicators and reliability objectives.
Experience identifying and reducing operational toil through automation and process improvement.
Experience contributing to platform architecture decisions or reusable cloud deployment patterns.
Hands‑on experience with infrastructure and delivery tools such as Terraform, Ansible, or Azure Dev Ops.
Experience with scripting/programming languages such as Go, Node.js, Power Shell, Python, or Shell scripting is a strong plus.
Exposure to cost management, capacity planning, and performance optimization in cloud environments.
Familiarity with cloud security and compliance standards such as SOC
2.
Relevant industry certifications (or progress toward certification), such as AWS Certified Solutions Architect or Dev Ops Engineer.
Flexibility to adjust working day to accommodate co‑workers and customers operating in different geographical regions.

Benefits

Medical, Dental, Vision and Life Insurance package.
Travel insurance covered.
Employee Wellbeing programs, including EAP and Wellness programs such as Life Speak and Vitality.
Volunteer in community and company will pay for that.

Salary Range: $115,000 - $125,000 CAD

Imagine Communications is proud to be an equal opportunity workplace and is an affirmative action employer.

#J-18808-Ljbffr