Lead SRE - AWS, Terraform
Listed on 2025-12-30
-
IT/Tech
Systems Engineer, Cloud Computing
Lead SRE – AWS, Terraform
JPMorgan Chase
Lead Site Reliability Engineer in the CIB Markets Sales, Research and Data organization.
Job DescriptionPlay a pivotal leadership role advising and supporting software engineering teams globally. Migrate and manage applications in the public cloud, promote SRE principles, and drive initiatives such as unified telemetry, application and infrastructure modernization, SLO/SLI onboarding, advanced deployment strategies, and performance and scalability improvements.
Job Responsibilities- Design, code, test, and deliver software to automate manual operational work, including self‑healing and resiliency patterns.
- Define and implement a telemetry strategy, including rollout of application performance monitoring and cloud telemetry.
- Act as a culture carrier and adoption champion for site reliability, mentoring technologists within the organization.
- Troubleshoot priority incidents, facilitate blameless post‑mortems, and ensure permanent closure of incidents and related problem tasks.
- Engage and evangelize with development teams throughout the SDLC to build software for reliability and scale, minimizing unnecessary refactoring.
Identify application patterns and analytics to support better service level objectives. - Design automated software and product upgrades, change management, and release management solutions.
- Provide comprehensive guidance, tools, and solutions to support the firm’s growth.
- Become an expert on applications and platforms in your remit, understand interdependencies, and drive evolution and debugging of critical components.
- Bachelor’s degree or equivalent experience in software engineering.
- Demonstrated experience with a major public cloud provider (Amazon Web Services) and infrastructure as code (Terraform).
- Advanced understanding of site reliability culture and principles, with a track record of implementing SRE concepts such as SLOs and error budgets.
- Advanced knowledge of observability capabilities—metrics, tracing, SLOs, alerting, telemetry collection—and the ability to design critical and golden signal monitoring and dashboards.
- Strong communication skills and a desire to mentor and educate others on site reliability engineering principles.
- Experience defining non‑functional standards and blueprints for supportability (logging, alerting, resiliency patterns, etc.).
- Working knowledge of infrastructure components (routers, load balancers, cloud products, container systems, compute, storage, networks).
- Ability to partner with architecture teams in defining non‑functional application supportability standards.
- Proven leadership skills with a drive for continuous improvement.
- AWS Cloud Certification, Linux Foundation CKA/CKAD, Terraform Associate or other relevant certifications are a plus.
J.P. Morgan is a global leader in financial services, providing strategic advice and products to the world’s most prominent corporations, governments, wealthy individuals, and institutional investors. We are an equal opportunity employer and place a high value on diversity and inclusion. We do not discriminate on the basis of any protected attribute, including race, religion, color, national origin, gender, sexual orientation, gender identity, age, marital or veteran status, pregnancy or disability.
We also provide reasonable accommodations for religious practices, mental health, and physical disability needs.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: