Site Reliability Engineer, Group Technology
Listed on 2025-12-24
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Join our Group Benefits Engineering Team!
We are passionate about building software that solves problems. We count on our team to empower our users with a rich feature set, high availability, and stellar performance level to pursue their missions. The successful applicant will be involved with application support, application deployment, business requirement gathering and data analysis all in support of solutions that conform to Manulife standards. The candidate will need to have a breadth and depth of experience that will allow them to devise and support solutions that integrate appropriately within the organization and its application frameworks.
PositionResponsibilities
- Solving and optimizing systems or processes.
- Understand the business drivers and analytical use-cases.
- Addresses area-level risks, provides and implements mitigation plan.
- Reports about area readiness/quality, and raise red flags in crisis situations.
- Monitor production environment taking a complete view of system health;
Track and produce metrics for the team and develop strategies to increase efficiency. - Maintain software and systems to run the applications.
- Improve reliability, quality, and time-to-market of our suite of software solutions.
- Measure and optimize system performance, with an eye toward pushing our capabilities forward, anticipate customer needs, and innovating to continually improve.
- Provide primary operational support for multiple large, distributed software applications.
- Work with business clients, internal and external teams to debug or tackle applications issues.
- Flexible to provide On-call support to resolve issues as the need arises.
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.
- Partner with development teams to improve services through thorough testing and release procedures.
- Participate in system design consulting, platform management, and planning.
- Create sustainable systems and services through automation and uplifts.
- Balance feature development speed and reliability with well-defined service level objectives.
- Bachelor’s degree or equivalent experience in computer science or other technical field.
- Minimum 3 to 5+ year of relevant technical experience.
- Ability to problem solve to identify and resolve root cause.
- Ability to program (structured and OO) with one or more high level languages, such as React, Node.js, JavaScript or similar.
- Experience with one or more of the following:
Scheduling (CA, CA WLA), SQL queries and scripting, Excel, Informatica development (or related ETL tools), Shell Scripting/ Power Shell/UNIX, Windows/ Batch Scripting. - Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- Experience with Agile (Scrum or Kanban), Jira and Service Now.
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
- Previous success in technical engineering or application support.
- Coding experience beyond simple scripts.
- Knowledge of site reliability engineers (SREs) concepts.
- Experience with monitoring and alerting applications such as Moogsoft, xMatters, Azure Data Explorer, New Relic, or other similar tools.
We’ll empower you to learn and grow the career you want.
We’ll recognize and support you in a flexible environment where well-being and inclusion are more than…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: