Site Reliability Engineer/Ingénieur Fiabilité des Sites
Job in
Montreal, Montréal, Province de Québec, Canada
Listing for:
Procom
Full Time, Part Time, Contract
position
Listed on 2026-02-22
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, Network Engineer
Job Description & How to Apply Below
Position: Site Reliability Engineer / Ingénieur Fiabilité des Sites
Location: MontrealOn behalf of our banking client, Procom is seeking a Site Reliability Engineer for a 12-month contract that works 3 days/week at our client's office in Montreal
Job Title: Site Reliability Engineer
Experience Level: Level 4 (advanced): 7-15 years
Location: Montreal (Day 1 onboarding onsite / in office presence 3x week)
The Private Cloud SRE L3 team is part of the Enterprise Computing organization within the bank. The team has presence in cities globally and is focused on supporting cloud and container-based platforms for internal and external clients. You will integrate with the global follow the sun operations model, which translates to responsibility for technologies supported by the team in the respective regions.
Team members frequently interact with engineering teams and collaborate on the testing and certification of software deployed to the platform.
Primary Responsibilities:
Provide L3 support for the bank's private cloud, including on-call rotationWork closely with the internal engineering team and provide input on testing of new component releases and infrastructure upgrades, as well as performance, capacity, and monitoringCreate and improve processes for support, including training, documentation, customer engagement, automation, and scripting, incident, problem, and change managementWork together with L2 teams and other L3 team members internationallyQualifications
Required Skills:
5 to 10 years of relevant experience3 to 5 years of Linux experienceExperience in front and back-end development with GolangSound knowledge of server infrastructure, virtualization, cloud computingProven Kubernetes and Docker experienceExcellent understanding of internet and networking protocols, including TCP/IP, HTTP/HTTPSStrong understanding of security protocols, e.g. SSL/TLS, KerberosStrong organizational skills and ability to manage multiple tasks and high-pressure situations for outage resolutionExperience with Agile and Dev Ops/SRE conceptsHave administrative competence in at least one major scripting language or platform (for example Python)Communicate effectively with various user groups, e.g. developers and engineers, as well as remote team membersWilling to work in on-call rotation (every 5 weeks)Nice to have:
Knowledge of system monitoring in cloud environments, including cloud - specific products and toolsExperience in developing monitoring architecture and implementing monitoring agents, dashboards, and alertsExperience operating in large, enterprise environmentsExperience with maintaining high-availability production systemsExperience in enterprise-level hosting environments, in particular cloud and container technologies
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here: