Site Reliability Engineer - Core Platform Services
Listed on 2025-12-20
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Company Profile
At Morgan Stanley, we advise, originate, trade, manage and distribute capital for governments, institutions and individuals, and always do so with a standard of excellence. We are a leading global financial services firm that conducts its business through three principal business segments—Institutional Securities, Wealth Management (WM), and Investment Management. The Firm's employees serve clients worldwide from more than 1,200 offices in 43 countries.
WMBusiness Overview
Our WM business is one of the largest in the world with more than $2 trillion in client assets, $73 billion in lending balances, and nearly 16,000 Financial Advisors in 600+ offices across the U.S. Our Financial Advisors focus on delivering timely, customized solutions and services that help clients meet their financial and life goals. Our offering includes brokerage and investment advisory services, financial and wealth planning, access to credit and lending, cash management, annuities and insurance, and retirement services.
DepartmentProfile
Reliability Operations is responsible for risk mitigation, stability, driving performance, and efficiency across Wealth Management Technology. Through Production Operations, Observability Engineering, Resiliency Assessment & Validation and Reliability Engineering, we will improve and increase Wealth Management stability, reliability, resiliency, efficiency, and performance. If you are an exceptional individual who is interested in solving complex problems and building sophisticated solutions in a dynamic team environment, Reliability Operations is the place for you.
The ‘Site Reliability Engineer’ role is within the Core Platform Services Super Department in Wealth Management Technology.
We are looking for a Site Reliability Engineer at the Associate, Director and Vice President levels. The position in the Reliability Operations team is focused on delivering exceptional services to both BU and Dev partners to minimize/avoid any production outages. The role will focus on production support, automating deployments and working with the agile teams to build and support stable and reliable production systems.
The ideal candidate will be passionate about automation and skilled in one of the programming languages:
Python/PERL/ SHELL, Ruby, JAVA, C# or the like. Candidate should possess a strong understanding of database concepts, job scheduler, MQ, Web services, UNIX/LINUX/Windows OS as well as experience with debugging applications. We are looking for a strong leader with excellent communications skills who is committed to continuously improving and delivering results. Candidate should be organized, disciplined, detail-oriented, self‑motivated, and delivery-focused.
- Maintain applications once they are live by measuring and monitoring availability, latency and overall system health with a focus on business activities and continuously evaluate cost and TOIL.
- Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews.
- Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity; includes automation for other various operational needs.
- Troubleshoot infrastructure issues, reviewing log files, updating documentation, and having knowledge base with resolutions.
- Work closely with the application Development team to understand the platform and create tools/utilities to help with production management.
- Work with upstream data providers and upstream consumers, and reducing the amount of escalation to development teams.
- Develop scripts and assist with code changes along with operational tasks/activities.
- Work closely with Application Development to ensure that the support team has excellent knowledge of the application set, own and maintain support knowledgebase and documents.
- Use analytical skills to find trends in the environment and drive out problems.
- Lead effort to determine improvement areas to stabilize the plant.
- Identify risks and work with a sense of urgency, working within a team or…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).