Site Reliability Engineer Job Montréal area,Montreal Province de Québec Canada,IT/Tech

Location: Montreal

Client is seeking an experienced Site Reliability Engineer (SRE) to support and enhance the reliability, performance, and operational efficiency of our global Service Now SaaS platform. As part of the Application Infrastructure (AI) team, you will be instrumental in advancing SRE practices, ensuring seamless integration and stability across on-premise infrastructure and cloud systems. This role combines software development, automation, systems engineering, and operations in a highly collaborative environment.

This is a hybrid role with both development-focused and production operational responsibilities, including periodic on-call participation.

K
ey Responsibilities

Drive automation and reliability improvements to reduce operational overhead and increase system availability
Troubleshoot Service Now issues and occasionally resolve Linux-based infrastructure problems
Develop and maintain observability tools including metrics, logging, tracing, and alerting to track and enhance system health and performance
Collaborate with global SRE peers to deliver reliable and resilient Service Now capabilities
Identify, document, and prioritize technical debt and propose long-term solutions to reduce recurring issues
Contribute to the design and documentation of the Service Now ecosystem, including integrations with SQL databases, APIs, and web platforms
Participate in on-call rotation and respond effectively to technical incidents or outages
Provide input to policies and procedures with the goal of improving security, efficiency, and operational consistency
Champion a culture of continuous improvement, resilience, and operational excellence

Required Qualifications

Minimum 7+ years of professional experience in software development, system administration, or site reliability engineering
Experience in at least one of the following areas:
Service Now administration or development
Strong troubleshooting skills and a proactive approach to problem-solving
Familiarity with Linux systems, shell scripting, and general infrastructure support
Effective verbal and written communication skills
Demonstrated ability to collaborate and build strong working relationships in a team environment
Willingness to work in an on-call rotation and respond to critical incidents when needed

Preferred Qualificatio
ns

Direct experience with Service Now (administration or development)
Exposure to observability tools (e.g., Prometheus, Grafana, ELK, Splunk)
Familiarity with Dev Ops/SRE best practices and tools
Experience with infrastructure automation (e.g., Ansible, Terraform)
Knowledge of incident management, capacity planning, and monitoring frameworks

Certifications (if any)

Service Now certifications (Administrator, Developer) are a plus but not required
Relevant certifications in Linux, Dev Ops, or SRE disciplines are desirable

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language