Lead Site Reliability Engineer Job Boulder area,Colorado USA,IT/Tech

Creating Peace of Mind by Pioneering Safety and Security

At Allegion, we help keep the people you know and love safe and secure where they live, work and visit. With more than 30 brands, 12,000+ employees globally and products sold in 130 countries, we specialize in security around the doorway and beyond. Additionally, in 2024 we were awarded the Gallup Exceptional Workplace Award, which recognizes the most engaged workplace cultures in the world.

Lead Site Reliability Engineer

Allegion is looking for a Lead Site Reliability Engineer to work as part of a highly engaged small team within a global organization of 12,000+ employees, representing 30+ brands (including Schlage, Von Duprin and LCN) focused on safety, security and access management. You’ll work on solutions that will enable seamless access and help keep you and your loved ones safe and secure where they work, live and thrive.

Allegion is seeking a highly motivated Lead Site Reliability Engineer to lead our SRE team designing solutions targeted at extending security technology. The ideal candidate should have proven expertise in leading, designing, developing and deploying a scalable, robust system using cloud technologies.

What you’ll do:

Provide technical leadership and mentorship to a team of Site Reliability Engineers, promoting best practices in system architecture, reliability, and cloud security.
Design, implement, and manage high-availability and fault-tolerant systems using Java, Spring, AWS, and cloud security best practices.
Work with development teams to ensure that systems are designed with resiliency and security in mind.
Implement and manage monitoring, alerting, and logging solutions to track performance, availability, and security metrics across infrastructure and applications.
Troubleshoot production issues related to performance, scaling, and security, ensuring that issues are resolved in a timely manner with minimal impact.
Drive automation initiatives across infrastructure, security, and monitoring tasks, aiming to reduce manual intervention and improve efficiency.
Collaborate with cross-functional teams to design disaster recovery plans, backup strategies, and business continuity plans.
Write clean, efficient, and well-documented code that adheres to software development best practices and coding standards.
Stay updated with the latest industry trends, technologies, and best practices in software development and apply them to enhance our software applications.
Fix application bugs and validate them in lower environments, promoting fixes to the production environment using CI/CD pipelines.
Collaborate with software engineering teams to build and deploy applications using best practices in reliability, observability, scalability, and security.
Develop and implement automation tools and frameworks to streamline operational processes, reduce manual intervention, and improve efficiency.
Build dashboards to measure KPIs and SLOs with a single pane of glass mindset.
Participate in on-call rotations and respond to incidents, ensuring timely resolution and minimal impact on users, thereby meeting SLOs/SLAs.

What we’re looking for:

A bachelor’s degree or equivalent experience in a relevant field.
A minimum of 7 years of relevant work experience.
Strong proficiency in Java, Node.js, and the Spring Framework, with experience in building and maintaining cloud-native microservices.
Demonstrated industry experience in providing hands-on technical expertise to design, develop, deploy, secure, and optimize cloud services.
Expertise in Docker and Kubernetes, with the ability to design and implement distributed services that meet strict reliability and performance requirements.
Proficiency in using monitoring, alerting, and log aggregation tools such as Sentry and the ELK Stack.
Strong troubleshooting skills with experience in managing complex production incidents.
Knowledgeable in disaster recovery, backup strategies, and high-availability principles.
Expertise in writing reusable Terraform modules.
Skilled in identifying bottlenecks and resolving issues across infrastructure and software platforms.
Enthusiastic about working with a diverse range of…


Increase/decrease your Search Radius (miles)



Job Posting Language