Site Reliability Engineer Lead
Job in
Lakeville, Dakota County, Minnesota, 55044, USA
Listed on 2026-07-03
Listing for:
Dormont Manufacturing Co
Full Time
position Listed on 2026-07-03
Job specializations:
-
IT/Tech
SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, Systems Engineer
Job Description & How to Apply Below
Site Reliability Engineer Lead – Pennington, NJ
Bank of America, National Association
Annualized salary: $ – $
Shift: 1st shift
Role OverviewLead a Site Reliability Engineering team to partner with application development and production support teams. Establish and maintain instrumentation, tooling, ticketing, alerting, and on‑call routines for key services. Demonstrate high technical expertise across domains and decompose complex objectives into actionable work units.
Responsibilities- Collaborate with Development and Infrastructure teams to implement monitoring capabilities based on design specifications.
- Develop, maintain, and mentor others on reliability scripts, tools, and libraries for instrumentation, automation, and operational needs.
- Guide application teams in adopting common reliability libraries and tools.
- Participate in architecture community meetings and communicate best practices.
- Identify vulnerabilities and opportunities for reliability improvement; analyze low‑level error rates and noise in monitoring.
- Serve as a subject‑matter expert during major incident triage and failure scenario modeling.
- Define and maintain a multi‑year stability roadmap aligned with business and technology strategy.
- Identify critical dependencies, risks, and mitigation strategies across infrastructure, applications, and services.
- Work with architects to enforce enterprise patterns that enhance reliability and fault tolerance.
- Ensure designs adhere to high availability, disaster recovery, and performance optimization best practices.
- Establish stability metrics, KPIs, and compliance standards for technology teams.
- Drive adoption of reliability engineering principles across development and operations.
- Collaborate with engineering, operations, and product teams to embed stability into the software development lifecycle.
- Act as a trusted advisor to senior leadership on stability initiatives and investments.
- Monitor emerging technologies and industry trends to enhance stability strategies.
- Lead post‑incident reviews and integrate lessons learned into future designs.
- 8+ years in technology architecture, reliability engineering, or infrastructure strategy roles.
- Proven track record delivering stability‑focused initiatives in large‑scale environments.
- Strong knowledge of distributed systems, cloud architecture (AWS, Azure, GCP), and microservices.
- Experience with reliability engineering, chaos testing, and observability tools.
- Ability to influence cross‑functional teams and communicate complex concepts to non‑technical stakeholders.
- SRE Certification.
- Automation
- Influence
- Production Support
- Result Orientation
- Analytical Thinking
- Application Development
- Solution Design
- Stakeholder Management
- Dev Ops Practices
- Project Management
- Solution Delivery Process
Eligible for the annual discretionary plan based on performance and company success. Industry‑leading benefits, paid time off, and support resources to contribute to business growth and community impact.
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×