Site Reliability Administrator
Listed on 2026-02-07
-
IT/Tech
IT Support, Systems Administrator
At PGE, our work involves dreaming about, planning for, and realizing a smarter, cleaner, more enduring Oregon neighborhood. Its core to our DNA and we haven’t stopped since we started in 1888. We energize lives, strengthen communities and drive advancements in energy that promote social, economic and environmental progress. We’re always on the lookout for people passionate about leading and being a part of teams that are advancing innovative clean energy solutions that are also affordable and accessible to all.
SiteReliability Administrator Job Overview
This role exists to support the reliability, stability, and continuous improvement of critical IT systems and services that enable day-to-day business operations. By ensuring systems are maintained , monitored, and improved in alignment with established standards, this position helps minimize service disruption and supports dependable technology outcomes across the organization. The work is impactful because it directly contributes to operational resilience, effective incident response, and the ongoing evolution of system reliability practices.
Individuals in this role collaborate with peers, managers, and stakeholders to support dependable service delivery while building a strong foundation for continuous improvement.
This position is open to two P-levels :
Staff Site Reliability Administrator (Grade 6 / P2 – Intermediate) and Site Reliability Administrator (Grade 7 / P3 – Career).
The level at which an offer is made will be determined based on the selected candidate’s qualifications, skills, and experience.
IT Infrastructure (ITOP)
Carries out agreed operational procedures, including network configuration, installation, and maintenance. Supports maintenance windows to minimize service disruption. Uses network management tools to collect and report on network load and performance statistics. Contributes to the implementation of maintenance and installation work including parching and updates to servers, operating systems and infrastructure components. Uses standard procedures and tools to carry out defined system backups, restoring data where necessary.
May assist with patch deployment validations to ensure compliance with security and operational standards. Monitors system health and identifies operational problems and contributes to their resolution. Contributes to reliability reviews and continuous improvement initiatives.Incident Management (USUP)
Following agreed procedures, identifies , registers and categorizes incidents. Gathers information to enable incident resolution and promptly resolves incidents within established service level agreements (SLAs). Escalates complex issues as appropriate . Maintains records and advises relevant persons of actions taken.Problem Management (PBMG)
Investigates problems in systems, processes and services. Assists with the implementation of agreed remedies and preventative measures.Automation (AUTM)
Develops scripts and automation tools to streamline repetitive tasks (e.g., patching, monitoring, reporting). Maintains documentation for automated processes and workflows.System Software (SYSP)
Uses system management software and tools to collect agreed performance statistics. Carries out agreed system software maintenance tasks and minimizes service disruption.Change Control (CHMG)
Administers, tracks, logs, reports on change requests, using appropriate tools , techniques and processes. Provides assistance to implement standard low-risk changes, in accordance with defined change control procedures. Provides documentation of incident resolutions, maintenance procedures and automation scripts. Contributes to knowledge transfer.Service Level Management (SLMO)
Monitors service delivery performance metrics and liaises with managers and customers to ensure that service level agreements (SLAs) are not breached without the stakeholders being given the opportunity of planning for deterioration in service.Information Security (SCTY)
Assists with implementing and monitoring security policies and protocols across different systems.…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).