Site Reliability Engineer
Listed on 2025-12-02
-
IT/Tech
Cloud Computing, IT Support, SRE/Site Reliability, Systems Engineer
Site Reliability Engineer (SRE), Service Now, Application Infrastructure
Location:
Montreal – Hybrid – 3 days/week
The Application Infrastructure (AI) department is seeking a Site Reliability Engineer (SRE) to help drive reliability engineering, operations and customer support services for client’s Service Now SaaS implementation. Reporting to a Site Reliability Engineering & Operations Lead.
This position specializes in Service Now Software as a Service which provides a suite of IT service management capabilities and is integrated with many products such as chatbot technology, on-call escalation incident management, and a range of other on-premises infrastructure (including SQL databases, APIs, and web infrastructure). Despite the focus on value‑add development and process delivery, this is also a production‑side, operational role requiring participation in an on‑call rotation from time to time.
Successful candidates for SRE roles in Application Infrastructure have so far come from a variety of backgrounds; maybe a developer today looking to evolve site reliability as a practice, or an infrastructure specialist with an interest in reliability and resilience principles, or a strong system admin who enjoys troubleshooting along with some task automation experience.
Prior experience in the financial services industry is not required, and we welcome candidates from all industries and backgrounds to apply.
Responsibilities- Delivery of improvements that will maximize the availability and performance of supported systems through optimized and automated operational tasks, collaborating on the development of operational tools, ongoing problem management, and architecture reviews with colleagues.
- Troubleshooting Service Now issues, and also some on‑premises capabilities in a Linux environment from time to time, collaborating with others to get to the bottom of issues, and agreeing on lasting improvements that can be made.
- Exploring and delivering observability including metrics, logging, tracing and alerting that can define and measure the target reliability of a product.
- Being dependable and responsive during agreed hours, such as when part of the on‑call rotation with the rest of the global team (with a time‑off in lieu system).
- A commitment to understanding the Firm’s Service Now instances and related dependencies, contributing to their documentation.
- Identification and prioritization of technical debt that can impact client satisfaction or operational efficiency.
- Giving feedback on policy and procedures related to the delivery of SRE and operational practices with a view to continually making the Firm safer and more efficient.
- The ideal candidate would have at least one of:
Software development skills in one or more programming languages, e.g., Python, Service Now administration or development experience. - 10+ years of experience
- Proficient oral and written communication skills
- Establishing warm, effective relationships with colleagues to collaborate on successful delivery
- A dependable team worker with demonstrated commitment to client service
- Ability to respond appropriately during occasional technical emergencies, like outages.
- Open to work in on‑call rotation
- Service Now administration or development experience, although this can be acquired by the successful candidate via on‑the‑job and via training.
Mid‑Senior level
Employment TypeContract
Job FunctionInformation Technology
IndustriesBanking, Financial Services, and Capital Markets
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: