Site Reliability Engineer
Listed on 2025-12-02
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Client’s Application Infrastructure (AI) division is seeking a Site Reliability Engineer (SRE) to join the Client Development Environment team. This role is focused on driving reliability, operational efficiency, and support for core development lifecycle tools used by over 17,000 developers across the firm. The ideal candidate will play a critical role in scaling and maintaining high-performing systems, ensuring system resilience, and working closely with developers to maximize productivity while minimizing manual operational effort.
Job Responsibilities:
- Gain and maintain full-stack knowledge of Morgan Stanley’s development environment
- Ensure maximum availability and performance of systems through architecture reviews, problem management, and plant optimization
- Automate plant management tasks and develop tools to reduce operational effort and support costs
- Identify and address technical debt that impacts developer productivity or system reliability
- Collaborate with other SREs across Application Infrastructure to implement shared solutions
- Troubleshoot complex issues across the full development stack
- Enhance Ops team product knowledge to reduce issue escalation rates
- Consult with internal developer clients to help troubleshoot and optimize use of Client tooling
- Experiment with emerging technologies, tools, and techniques to improve operations
- Participate in a global on-call rotation with compensatory time-off
- Champion operational responsiveness and a strong culture of reliability and automation
- Programming/scripting experience for task automation (Python preferred)
- Hands-on experience with observability tools like Prometheus and Grafana
- Experience with version control (Bitbucket, Git Hub), issue tracking (Jira), CI tools (Jenkins, Git Hub Actions, Azure Dev Ops)
- Familiarity with automated testing and deployment pipelines
- Strong interpersonal and communication skills
- Proven collaboration capabilities within technical stakeholder groups
Skills:
- Familiarity with SRE principles such as SLOs, error budgets, toil reduction, and blameless postmortems
- Experience with containerization technologies such as Docker and orchestration tools like Kubernetes
- Prior exposure to large-scale development environments or developer tooling platforms
[Not Specified – Relevant certifications in Linux, Python, Kubernetes, or SRE practices are a plus]
Education:
Bachelor’s degree in computer science, Engineering, or related field (preferred)
Email
* This field is required Please enter valid email
Id.
Cell phone
* This field is required Please enter valid cell phone.
First Name
* This field is required Please enter valid first name.
Last Name
* This field is required Please enter valid last name.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: