More jobs:
Site Reliability Engineer – Operations
Job in
Manila, Daggett County, Utah, 84046, USA
Listed on 2026-06-21
Listing for:
RealPage, Inc.
Full Time
position Listed on 2026-06-21
Job specializations:
-
IT/Tech
SRE/Site Reliability
Job Description & How to Apply Below
Overview
The SRE Ops Engineer reports to the Sr. Director of Reliability Engineering and is responsible for ensuring product stability, operational excellence, and a strong customer experience across critical platforms, with a primary focus on Windows‑based environments. This role partners closely with Engineering, Cloud Ops, Info Sec, and QA to reduce incidents, improve system reliability, and drive operational rigor through automation, monitoring, and incident management.
Responsibilities- Manage and support Windows‑based production environments, including IIS, Windows Services, Active Directory, and related infrastructure
- Build, maintain, and enhance monitoring, alerting, and observability frameworks using ELK or equivalent platforms
- Lead incident response, troubleshooting, and root cause analysis (RCA) for customer‑impacting issues
- Improve system reliability by reducing critical incidents and driving down Mean Time to Resolution (MTTR)
- Develop and maintain automation using scripting tools such as Power Shell, Python, or similar technologies
- Support high‑availability, high‑performance production systems and participate in on‑call rotations
- Collaborate with cross‑functional teams to ensure platform stability, security, and reliability
- Contribute to platform upgrades, patching, modernization initiatives, and operational best practices
- Create and maintain runbooks, operational standards, and documentation
- 5+ years of experience in Windows Server environments, including IIS and Windows Services
- 5+ years of experience with monitoring and observability tools (ELK stack or equivalent)
- Strong experience with incident management, troubleshooting, and root cause analysis
- Hands‑on experience with automation and scripting (Power Shell, Python, etc.)
- Working knowledge of Linux systems for basic administration and troubleshooting
- Strong understanding of system performance, scalability, and operational best practices
- Experience supporting production systems with high availability requirements
- Familiarity with cloud platforms (AWS, GCP, Azure) is a plus
- Exposure to CI/CD tools and Dev Ops practices
- Strong communication, collaboration, and ownership mindset
- Ability to operate effectively in a fast‑paced, production‑focused environment
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×