Site Reliability Engineer/Ingénieur Fiabilité des Sites Job Montréal area,Montreal Province de Québec Canada,IT/Tech

Position: Site Reliability Engineer / Ingénieur Fiabilité des Sites
Location: Montreal

On behalf of our banking client, Procom is seeking a Site Reliability Engineer for a 12-month contract that works 3 days/week at our client's office in Montreal

Job Title: Site Reliability Engineer
Experience Level: Level 4 (advanced): 7-15 years
Location: Montreal (Day 1 onboarding onsite / in office presence 3x week)

The Private Cloud SRE L3 team is part of the Enterprise Computing organization within the bank. The team has presence in cities globally and is focused on supporting cloud and container-based platforms for internal and external clients. You will integrate with the global follow the sun operations model, which translates to responsibility for technologies supported by the team in the respective regions.
Team members frequently interact with engineering teams and collaborate on the testing and certification of software deployed to the platform.

Primary Responsibilities:

Provide L3 support for the bank's private cloud, including on-call rotation

Work closely with the internal engineering team and provide input on testing of new component releases and infrastructure upgrades, as well as performance, capacity, and monitoring

Create and improve processes for support, including training, documentation, customer engagement, automation, and scripting, incident, problem, and change management

Work together with L2 teams and other L3 team members internationally

Qualifications
Required Skills:

5 to 10 years of relevant experience

3 to 5 years of Linux experience

Experience in front and back-end development with Golang

Sound knowledge of server infrastructure, virtualization, cloud computing

Proven Kubernetes and Docker experience

Excellent understanding of internet and networking protocols, including TCP/IP, HTTP/HTTPS

Strong understanding of security protocols, e.g. SSL/TLS, Kerberos

Strong organizational skills and ability to manage multiple tasks and high-pressure situations for outage resolution

Experience with Agile and Dev Ops/SRE concepts

Have administrative competence in at least one major scripting language or platform (for example Python)

Communicate effectively with various user groups, e.g. developers and engineers, as well as remote team members

Willing to work in on-call rotation (every 5 weeks)

Nice to have:

Knowledge of system monitoring in cloud environments, including cloud - specific products and tools

Experience in developing monitoring architecture and implementing monitoring agents, dashboards, and alerts

Experience operating in large, enterprise environments

Experience with maintaining high-availability production systems

Experience in enterprise-level hosting environments, in particular cloud and container technologies


Increase/decrease your Search Radius (miles)



Job Posting Language

Site Reliability Engineer​/Ingénieur Fiabilité des Sites

Site Reliability Engineer/Ingénieur Fiabilité des Sites