Production & Reliability Management Expert
Location: Montreal
Production & Reliability Management Expert (contract)
3 months ago Be among the first 25 applicants
We are seeking a Production & Reliability Management Expert to drive operational excellence and system reliability for Identity & Access Management platforms within Morgan Stanley’s Cyber Data Risk & Resilience (CDRR) division. This role requires strong technical proficiency in Java, Python, web programming, automation, and database management, alongside expertise in production incident management, automation development, and Agile/SRE practices. You will be responsible for managing critical incidents, reducing operational overhead through automation, and ensuring production alignment within the Agile software development process.
Key Responsibilities- Manage critical production incidents, ensuring timely communication and resolution with key management and business stakeholders.
- Embed Production Management practices into Agile development processes, ensuring code meets production standards.
- Reduce operational support costs by eliminating issues, optimizing workflows, and implementing automation tools.
- Lead incident calls, coordinating multiple teams toward resolving impactful outages.
- Identify and prioritize technical debt that risks system stability or creates operational inefficiencies.
- Analyze business processes to identify automation opportunities and develop, test, and deploy automation solutions.
- Integrate automation solutions into existing systems and infrastructure.
- Monitor, troubleshoot, and resolve automation issues.
- Collaborate with stakeholders to gather requirements and deliver aligned solutions.
- Work within Dev Ops, Agile, Scrum, and SRE principles to enhance reliability and delivery.
- Proficiency in Java for building medium to large-scale, multi-threaded applications.
- Strong scripting skills in Python and Shell.
- Experience with web programming and REST/SOAP APIs.
- Database expertise in SQL, DB2, Sybase, or Snowflake, including reporting.
- Experience creating automated test suites, executing SDLC workflows, and managing automated deployments.
- System knowledge in Unix/Linux environments, including infrastructure setups such as load balancing.
- Familiarity with Ansible, Git Hub, or similar configuration/release management tools.
- Strong incident management and crisis coordination skills.
- Ability to work closely with developers and business stakeholders to align production operations with Agile delivery.
- Process-driven approach with a focus on optimization, automation, and efficiency.
- Skilled at translating business requirements into technical solutions.
- Adept at working in fast-paced, high-stakes environments with a focus on security, compliance, and operational resilience.
- Production & reliability management
- Incident response and resolution
- Identity & Access Management (IAM) systems
- Automation development and integration
- Agile, Dev Ops, and SRE methodologies
- Java
- Python
- Shell scripting
- Web programming (REST/SOAP APIs)
- Automated testing frameworks
- SDLC deployment tools
- Git Hub
- Ansible
- Unix/Linux systems
- DB2, Sybase, Snowflake
- SQL
- Load balancing infrastructure
- Analytical and problem-solving mindset
- Strong communication and stakeholder management
- Self-motivated and proactive ownership of issues
- Ability to work effectively in cross-functional teams
- Adaptability and quick learning in dynamic environments
The pay range that the employer in good faith reasonably expects to pay for this position is $58.80/hour - $91.87/hour. Our benefits include medical, dental, vision, and retirement benefits. Applications will be accepted on an ongoing basis.
Tundra Technical Solutions is among North America’s leading providers of Staffing and Consulting Services. Our success and our clients’ success are built on a foundation of service excellence. We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.
Qualified applicants with…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: