×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Glasgow, Glasgow City Area, G1, Scotland, UK
Listing for: Intermedia Intelligent Communications
Full Time position
Listed on 2026-05-31
Job specializations:
  • IT/Tech
    IT Support, Systems Engineer, SRE/Site Reliability, Cloud Computing
Job Description & How to Apply Below

Site Reliability Engineer

Department: Tech Operations

Employment Type: Full Time

Location: United Kingdom

Reporting To: Taras Kapanaiko

Description

ALL CANDIDATES MUST BE LOCATED IN THE UK

Intermedia is a leading provider of cloud communications and collaboration tech. We have a strong track record of growth, profitability, and creating an environment where everyone matters. While we are fast‑paced and admittedly a bit intense, we promise that you won’t be bored. You will find Intermedia is a place where you can indulge your passion for creating and supporting great cloud technology.

About the role:

We are looking for an SRE to improve reliability and operational readiness with a strong focus on metrics, alerting, and event management. You will build and maintain monitoring using Prometheus/Victoria Metrics, integrate alerts and events with Big Panda, and participate in on‑call rotations to drive fast incident response and continuous improvement across Windows and Linux environments.

Key Responsibilities
  • Build and operate metrics/monitoring platforms:
    Prometheus and/or Victoria Metrics (scrape configs, exporters, recording rules)
  • Design and maintain alerting strategy: thresholds, anomaly detection where applicable, alert routing, deduplication, and noise reduction
  • Integrate monitoring/alerting and events with Big Panda (correlation, enrichment, routing, incident workflows)
  • Create and maintain dashboards and operational visibility (Grafana or equivalent)
  • Develop and maintain runbooks, operational playbooks, and incident response procedures
  • Participate in on‑call shifts: triage alerts, manage incidents, coordinate response, and lead communication during outages
  • Perform root‑cause analysis, postmortems, and implement corrective/preventive actions
  • Improve service reliability via SLOs/SLIs, capacity planning, and automation to reduce toil
  • Support monitoring for core infrastructure and services on Windows and Linux, including HA components and clusters
  • Collaborate with Dev Ops/Engineering to instrument applications and standardize telemetry (metrics, logs, traces where applicable)
Skills, Knowledge and Expertise
  • Experience in SRE / Operations / Dev Ops with production incident ownership
  • Hands‑on experience with Prometheus and/or Victoria Metrics (exporters, alert rules, recording rules, troubleshooting)
  • Experience integrating alerting/event pipelines with Big Panda (or similar event correlation tools)
  • Strong troubleshooting skills across Linux and Windows systems (networking, OS, services)
  • Ability to build reliable alerting with minimal noise (correlation, grouping, suppression, maintenance windows)
  • Experience with Git‑based workflows for monitoring‑as‑code and configuration management
Nice to Have
  • Grafana administration and dashboard design standards
  • Log management (ELK/EFK, Loki) and/or tracing (Open Telemetry)
  • Automation skills (Python, Power Shell, Bash) and configuration tools (Ansible)
  • Messaging/cache/proxy operations:
    Rabbit

    MQ, Redis, Nginx
  • Experience with Windows clustering or HA environments
  • Experience defining SLOs/SLIs and operational KPIs
  • Experience in managing VOIP components and protocols (SIP, Free Switch, OpenSIP, session border controllers)
  • Experience with load balancing components (F5 LTM, F5 GTM)
  • Experience with virtualization platforms such as VMWare or HyperV
  • Experience with administering AWS or Azure tenants
On-call Expectations
  • Participation in a rotating on‑call schedule (including nights/weekends as needed)
  • Ownership of incident response: rapid triage, escalation, mitigation, and follow‑up improvements
  • Commitment to improving monitoring quality to reduce alert fatigue and improve MTTR
Diversity, Inclusion, and Equal Opportunity

We hire, promote, and compensate employees based on their ability to perform their job responsibilities, without regard to race, color, creed, religion, sex, gender, marital status, national origin, ancestry, age, citizenship, physical or mental disability, sexual orientation, or any other basis protected by applicable law. We do not tolerate employment discrimination in the workplace, and we are committed to making reasonable accommodations for identified disabilities or other limitations as required by all applicable laws.

We are an equal opportunity employer and value diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary