Sr. Network Site Reliability Engineer; SREs Job Central London area,City Of London England UK,IT/Tech

Position: Sr. Network Site Reliability Engineer (SREs)
Location: City Of London

Overview

We are seeking a highly experienced
Senior Network SRE
with deep expertise across multi-vendor network infrastructure, automation, and reliability engineering. The ideal candidate will possess strong technical leadership, hands-on engineering capabilities, and a passion for building resilient, scalable, and observable network environments.

Key Responsibilities

Design, implement, and maintain highly available network solutions across routing, switching, firewalling, and wireless technologies.
Apply SRE principles to improve network reliability, scalability, and performance.
Develop and maintain automation workflows using
Ansible
,
Salt
, and related frameworks to reduce operational toil.
Build and operate monitoring, alerting, and observability dashboards using tools such as
Grafana
and
Splunk
.
Proactively identify network bottlenecks, performance issues, and reliability risks, implementing long-term fixes rather than reactive solutions.
Support incident response, root cause analysis, and post-incident reviews with a focus on continuous improvement.
Collaboration with cross-functional engineering, security, and operations teams to ensure network solutions meet business and technical requirements.
Contribute to documentation, runbooks, design artifacts, and operational standards.
Participate in capacity planning, network modernization initiatives, and automation-first strategies.

Required Skills & Experience

10+ years of hands-on experience
in enterprise or service provider network engineering.
Expertise in multi-vendor
routing, switching, firewalling, and wireless
technologies.
Deep understanding of network protocols (BGP, OSPF, EIGRP, STP, VXLAN, VPNs, QoS, MPLS, etc.).
Strong experience with infrastructure automation using
Ansible
and
Salt
.
Proficiency with observability tooling such as
Grafana
,
Splunk
, or equivalent.
Solid understanding of SRE practices including SLIs, SLOs, error budgets, and proactive reliability engineering.
Strong troubleshooting, analytical, and performance optimization skills.
Excellent communication and collaboration skills, with the ability to influence and guide technical stakeholders.

Nice to Have

Experience with network programmability (Python, API-driven networking, Net Conf/RESTConf).
Exposure to cloud networking (AWS, Azure, GCP).
Knowledge of zero-trust, SD-WAN, and network security best practices.
Experience creating self-healing or fully automated network workflows.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language