Site Reliability Engineer
Job in
Montreal, Montréal, Province de Québec, Canada
Listed on 2025-12-02
Listing for:
Compunnel, Inc.
Full Time
position Listed on 2025-12-02
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer, IT Support, SRE/Site Reliability
Job Description & How to Apply Below
Client is seeking an experienced Site Reliability Engineer (SRE) to support and enhance the reliability, performance, and operational efficiency of our global Service Now SaaS platform. As part of the Application Infrastructure (AI) team, you will be instrumental in advancing SRE practices, ensuring seamless integration and stability across on-premise infrastructure and cloud systems. This role combines software development, automation, systems engineering, and operations in a highly collaborative environment.
This is a hybrid role with both development-focused and production operational responsibilities, including periodic on-call participation.
K
ey Responsibilities
- Drive automation and reliability improvements to reduce operational overhead and increase system availability
- Troubleshoot Service Now issues and occasionally resolve Linux-based infrastructure problems
- Develop and maintain observability tools including metrics, logging, tracing, and alerting to track and enhance system health and performance
- Collaborate with global SRE peers to deliver reliable and resilient Service Now capabilities
- Identify, document, and prioritize technical debt and propose long-term solutions to reduce recurring issues
- Contribute to the design and documentation of the Service Now ecosystem, including integrations with SQL databases, APIs, and web platforms
- Participate in on-call rotation and respond effectively to technical incidents or outages
- Provide input to policies and procedures with the goal of improving security, efficiency, and operational consistency
- Champion a culture of continuous improvement, resilience, and operational excellence
- Minimum 7+ years of professional experience in software development, system administration, or site reliability engineering
- Experience in at least one of the following areas:
- Service Now administration or development
- Strong troubleshooting skills and a proactive approach to problem-solving
- Familiarity with Linux systems, shell scripting, and general infrastructure support
- Effective verbal and written communication skills
- Demonstrated ability to collaborate and build strong working relationships in a team environment
- Willingness to work in an on-call rotation and respond to critical incidents when needed
ns
- Direct experience with Service Now (administration or development)
- Exposure to observability tools (e.g., Prometheus, Grafana, ELK, Splunk)
- Familiarity with Dev Ops/SRE best practices and tools
- Experience with infrastructure automation (e.g., Ansible, Terraform)
- Knowledge of incident management, capacity planning, and monitoring frameworks
- Service Now certifications (Administrator, Developer) are a plus but not required
- Relevant certifications in Linux, Dev Ops, or SRE disciplines are desirable
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×