Site Reliability Engineer Lead Job Lakeville area,Minnesota USA,IT/Tech

Site Reliability Engineer Lead – Pennington, NJ

Bank of America, National Association

Annualized salary: $ – $

Shift: 1st shift

Role Overview

Lead a Site Reliability Engineering team to partner with application development and production support teams. Establish and maintain instrumentation, tooling, ticketing, alerting, and on‑call routines for key services. Demonstrate high technical expertise across domains and decompose complex objectives into actionable work units.

Responsibilities

Collaborate with Development and Infrastructure teams to implement monitoring capabilities based on design specifications.
Develop, maintain, and mentor others on reliability scripts, tools, and libraries for instrumentation, automation, and operational needs.
Guide application teams in adopting common reliability libraries and tools.
Participate in architecture community meetings and communicate best practices.
Identify vulnerabilities and opportunities for reliability improvement; analyze low‑level error rates and noise in monitoring.
Serve as a subject‑matter expert during major incident triage and failure scenario modeling.
Define and maintain a multi‑year stability roadmap aligned with business and technology strategy.
Identify critical dependencies, risks, and mitigation strategies across infrastructure, applications, and services.
Work with architects to enforce enterprise patterns that enhance reliability and fault tolerance.
Ensure designs adhere to high availability, disaster recovery, and performance optimization best practices.
Establish stability metrics, KPIs, and compliance standards for technology teams.
Drive adoption of reliability engineering principles across development and operations.
Collaborate with engineering, operations, and product teams to embed stability into the software development lifecycle.
Act as a trusted advisor to senior leadership on stability initiatives and investments.
Monitor emerging technologies and industry trends to enhance stability strategies.
Lead post‑incident reviews and integrate lessons learned into future designs.

Required Qualifications

8+ years in technology architecture, reliability engineering, or infrastructure strategy roles.
Proven track record delivering stability‑focused initiatives in large‑scale environments.
Strong knowledge of distributed systems, cloud architecture (AWS, Azure, GCP), and microservices.
Experience with reliability engineering, chaos testing, and observability tools.
Ability to influence cross‑functional teams and communicate complex concepts to non‑technical stakeholders.

Desired Qualifications

SRE Certification.

Skills

Automation
Influence
Production Support
Result Orientation
Analytical Thinking
Application Development
Solution Design
Stakeholder Management
Dev Ops Practices
Project Management
Solution Delivery Process

Benefits

Eligible for the annual discretionary plan based on performance and company success. Industry‑leading benefits, paid time off, and support resources to contribute to business growth and community impact.

#J-18808-Ljbffr