Director - Site Reliability Engineering Job Lowell area,Massachusetts USA,IT/Tech

Why UKG

At UKG, the work you do matters. The code you ship, the decisions you make, and the care you show a customer all add up to real impact. Today, tens of millions of workers start and end their days with our workforce operating platform. Helping people get paid, grow in their careers, and shape the future of their industries. That’s what we do.

We never stop learning. We never stop challenging the norm. We push for better, and we celebrate the wins along the way. Here, you’ll get flexibility that’s real, benefits you can count on, and a team that succeeds together. Because at UKG, your work matters—and so do you.

UKG is seeking a seasoned Director of Site Reliability Engineering (SRE) to help lead and shape reliability at enterprise scale. You will be responsible for the reliability, resilience, and operational excellence of UKG’s platforms worldwide.

This is a high-impact leadership role within a mature, mission-critical environment. You will inherit and lead an established SRE organization responsible for a large, complex ecosystem comprising hundreds of applications across a hybrid infrastructure.

Success in this role calls for strong systems thinking, operational leadership at scale, and the ability to influence across boundaries. You will drive consistent reliability practices across diverse technologies, modernize how reliability is delivered, and lead globally distributed teams in service of always-on, customer-critical platforms.

Responsibilities Production Reliability & Application Behavior

Responsible for reliability outcomes across a large, heterogeneous application portfolio, including availability, performance, scalability, and recoverability
Ensure applications meet defined reliability expectations as they operate on both on-prem and cloud platforms
Lead and participate in major incident response, acting as a senior escalation point and ensuring effective executive communication
Drive post-incident learning and systemic improvements to reduce repeat issues

Platform-Facing SRE Execution

Lead teams responsible for understanding how applications behave in production, including runtime performance, resource utilization, and failure modes
Partner with Infrastructure, Cloud, Security, and Product Engineering teams to address cross-layer reliability concerns
Establish standards for operational readiness, release safety, capacity planning, and disaster recovery across platforms

SRE Practice Consistency at Scale

Apply Site Reliability Engineering principles pragmatically across both legacy and cloud-native systems, including:
SLOs and reliability targets
Error budgets and risk-based decision-making
Toil identification and reduction
Automation and self-healing where appropriate
Observability to support incident response, performance analysis, and capacity management
Ensure SRE practices are consistent in intent but adapted in implementation across different technologies and environments

People Leadership & Organizational Health

Lead and develop SRE managers and engineers across a global organization
Inherit existing teams and improve clarity of ownership, execution discipline, and engagement
Hire and develop senior SRE leaders capable of operating across both cloud and enterprise platforms

Strategy, Planning & Influence

Translate business priorities into reliability-focused technical initiatives
Partner with senior Product and Engineering leadership to balance delivery velocity, reliability, and operational risk
Own and execute against a portion of the SRE roadmap, ensuring transparency, prioritization, and measurable outcomes
Advocate for reliability improvements using data, production insight, and operational experience

Qualifications

Required Qualifications

10+ years of experience in software engineering, systems engineering, SRE, or related disciplines
Proven experience leading established, globally distributed engineering organizations
Strong understanding of production systems and application behavior at scale
Experience operating and leading teams across hybrid environments (on-prem and public cloud)
Demonstrated ability to influence outcomes in a matrixed enterprise environment
Experience owning incident…


Increase/decrease your Search Radius (miles)



Job Posting Language