Senior Site Reliability Engineer; SRE
Listed on 2025-12-02
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Location: Montreal
Join to apply for the Senior Site Reliability Engineer (SRE) role at Intelcom | Dragonfly
.
Be among the first 25 applicants.
Responsibilities- Incident Management:
Detect and respond to issues, ensuring rapid recovery to minimize downtime. Define and implement an escalation process, coordinate communication across stakeholders, document incident reports and conduct post‑mortems for continuous improvement. - Collaboration:
Work closely with development and operations teams to ensure smooth deployment and operation of applications. Provide primary operational support and engineering for large‑scale distributed software, improve services through rigorous testing and release procedures, and participate in system design consulting, platform management, and capacity planning. - Influence:
Create sustainable systems and services through automation and enhancements; promote a culture of innovation and continuous improvement. Mentor and coordinate the SRE team, fostering professional growth and establishing operational policies that promote agility and scalability. - Automation:
Automate repetitive tasks to improve efficiency and reduce human errors, enhancing reliability, quality, and time‑to‑market of software solutions. - Monitoring and Alerting:
Implement and enhance monitoring systems (e.g., Datadog) to track health and performance, maintain high availability, gather metrics, and develop dashboards for stakeholder visibility. - Disaster Recovery:
Prepare and implement disaster recovery plans to manage unexpected outages. - Performance Optimization:
Continuously improve system performance and scalability. - Capacity Planning:
Ensure infrastructure can handle current and future demands. - Chaos Engineering:
Intentionally introduce failures to test system resilience and improve robustness.
- Bachelor's degree in software engineering, computer science, or equivalent.
- Minimum of 7 years experience in cloud management, development, and/or SRE responsibilities.
- Experience in Agile methodology and technical project execution.
- Knowledgeable in Dev Ops concepts, AWS, Azure, GCP, observability tools (Datadog, Cloudflare), Terraform, Pager Duty, and integration of these technologies.
- Strong initiative and resilience, with a demonstrated ability to explore new ideas and innovative approaches to solving complex problems.
- Excellent interpersonal and communication skills in both French and English.
- Comfortable evolving in a fast‑moving environment.
Schedule:
Primarily daytime hours, but on‑call availability is required for the initial months to observe and refine existing processes.
Intelcom est une entreprise de logistique du dernier kilomètre chef de file dans le secteur du commerce électronique. Nos équipes d’un bout à l’autre du Canada ainsi que notre réseau d’entrepreneurs indépendants contribuent aux activités quotidiennes d’Intelcom.
Notre objectif est simple : dans un secteur d’activité en constante évolution, nous ne nous contentons pas de suivre, nous prenons les devants. En plus de nous démarquer grâce à des méthodes de livraison et à des services novateurs, Intelcom opère aussi une transformation technologique où l’intégration de l’expérience client et les technologies logistiques sont au cœur de son évolution.
Chez Intelcom, nous savons que l'expérience se présente sous plusieurs formes et nous nous engageons à créer une culture où la différence est valorisée. Nous sommes toujours à la recherche de personnes talentueuses et diversifiées pour rejoindre nos équipes. Avec plus de 60 centres de livraison à travers le Canada, nous pouvons avoir la bonne opportunité pour vous.
Postulez aujourd'hui.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: