×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer - Eng

Job in Lowell, Middlesex County, Massachusetts, 01856, USA
Listing for: Fairygodboss
Full Time position
Listed on 2026-05-30
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Systems Engineer, Cloud Computing, IT Support
Salary/Wage Range or Industry Benchmark: 129500 - 186100 USD Yearly USD 129500.00 186100.00 YEAR
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer - Eng

About the Team

Staff Site Reliability Engineers (SREs) at UKG are individual contributors who play a critical role in ensuring the reliability, scalability, and performance of our services. They bring a breadth of knowledge across service delivery and apply software engineering principles to operational challenges.

In this role, you will ensure the reliability, availability, and performance of production systems by applying software engineering practices to operations. SREs proactively monitor system health, manage risk through SLOs and error budgets, lead incident response, and enable safe, rapid change while balancing reliability and delivery velocity.

Staff SREs are passionate about learning and evolving with modern technologies. They strive to innovate and relentlessly pursue an excellent customer experience, with an automate everything mindset that enables services to be delivered with speed, consistency, and high availability.

This is a senior individual contributor role, focused on technical leadership, influence, and reliability impact.

About the Role and

Job Responsibilities
  • Engage in and improve the lifecycle of services from conception to end-of-life, including system design reviews, capacity planning, and production readiness.
  • Define and implement standards and best practices for system architecture, service delivery, reliability, and automation, including the definition and monitoring of service health indicators (latency, traffic, error rates, and resource saturation), service level objectives (SLOs), and the use of error budgets to guide operational and delivery decisions.
  • Support service, product, and engineering teams by providing common tooling and frameworks to increase availability and improve incident detection and response.
  • Improve system performance, availability, and efficiency through automation, process refinement, post-incident reviews, and in-depth configuration analysis.
  • Collaborate closely with engineering teams across the organization to deliver and operate reliable services.
  • Increase operational efficiency, effectiveness, and service quality by treating operational challenges as software engineering problems (reducing toil).
  • Guide junior team members and serve as a champion for Site Reliability Engineering best practices.
  • Actively participate in incident responses, including on-call rotations and post-incident reviews, collaborating with engineering teams to restore service and reduce recurrence.
  • Partner with stakeholders to influence and help drive the best possible technical and business outcomes.
Required Qualifications
  • 5+ years of hands‑on experience in software engineering, systems engineering, or cloud‑based environments.
  • 5+ years of experience working with public cloud platforms (e.g., GCP (preferred), AWS, or Azure).
  • 5+ years of experience configuring, operating, and maintaining applications and/or systems infrastructure in a large‑scale, customer‑facing environment.
  • Demonstrated understanding of observability best practices, including metric generation and collection, log aggregation pipelines, time‑series databases, and distributed tracing.
  • Experience coding in one or more higher‑level programming languages (e.g., Python, Java, or C++).
  • Strong working knowledge of Linux systems, including troubleshooting, performance analysis, and scripting in production environments.
  • Experience with Git Hub Actions and modern CI/CD practices.
  • Experience building operational dashboards and alerts using observability tools such as Splunk or Grafana.
  • Excellent communication and collaboration skills, with experience of mentoring and guiding engineers.
Preferred Qualifications
  • Experience with distributed system design and architecture.
  • Hands‑on experience with cloud‑native applications and containerization technologies (Kubernetes, containers).
  • Experience with infrastructure‑as‑code and configuration management tools (e.g., Terraform, Ansible).
  • Experience operating production workloads in Google Cloud Platform (GCP).
  • Solid grounding in at least two of the following areas:
    Computer Science fundamentals, Cloud Architecture, Security, or Network Design.

THIS POSITION IS 3 DAYS ON SITE IN LOWELL, MA

E…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary