×
Register Here to Apply for Jobs or Post Jobs. X

Lead Site Reliability Engineer

Job in Newcastle upon Tyne, Newcastle, Tyne and Wear, SY7, England, UK
Listing for: Arbuthnot Latham
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    IT Support, Systems Engineer
Job Description & How to Apply Below
Location: Newcastle upon Tyne

Team Lead - Site Reliability Engineering

Join to apply for the Team Lead - Site Reliability Engineering role at Arbuthnot Latham

Arbuthnot Latham has been associated with banking since 1833. We combine private and commercial banking, wealth planning and investment management. We believe in traditional relationship and service-led banking powered by modern technology.

Job purpose

The Team Lead - Site Reliability Engineering is responsible for ensuring the effective and efficient running of the current NOC team with a view to transition to an SRE function over time. The team is responsible for enabling innovation and velocity of change while ensuring system reliability focusing on the critical features and functionality within products and platforms. It collaborates with the business or product owners to prioritise operational requirements by defining service‑level indicators (SLIs) and service‑level objectives (SLOs) to monitor and optimise customer journey and experience.

Its goal is to design and operate scalable resilient systems utilising software engineering principles. It brings skills and expertise to automating manual tasks (TOIL) in such areas as incident management, problem management, change management, and release management tasks, and provides operational insights through monitoring and observability; and other aspects involved in preparing and optimising automated delivery solutions. To place the interests of customers at the centre of all activities, act in a way that is consistent with achieving good outcomes for consumers;

and to comply with the FCA and PRA’s Conduct Rules.

Key Responsibilities
  • Lead, manage and motivate the team.
  • Ensure the team are following best practice across all disciplines.
  • Have oversight of team tasks including investigation, troubleshooting, diagnosis, resolution and recovery to minimise impact to services.
  • Audit the Engineers’ calls and tickets for quality assurance and provide feedback and coaching as required.
  • Drive a culture of Customer Excellence and Continual Service Improvement within the team.
  • Identify, develop, communicate, and implement process changes within the team.
  • Act as a point of escalation for the team.
SRE responsibilities
  • Help define the SRE practice for the organisation, collaborate with other stakeholders to select the relevant SRE principles, define the objectives and measurements of the outcomes.
  • Collaborate with stakeholders such as product and platform owners, to define service level objectives (SLOs), and service-level indicators (SLIs) for system operations focused on the critical features of the customers journey and experience.
  • Track and manage reliability performance against agreed SLOs
    , in partnership with other IT teams or other stakeholders, and ensure systems continue to meet SLOs over time.
  • Ensure key stakeholders, product owners, and platform owners are informed of reliability concerns and their potential impact to the customer experience.
  • Provide expert knowledge on reliability approaches, to ensure our organisation achieves its goals and roadmap for reliability.
  • Champion reliability being treated as a feature in products and platforms and promote the concept across all phases of the software development life cycle.
  • Create dashboards and reports to communicate key metrics, to product owners and key stakeholders.
  • Design, code, test and deliver solutions to automate manual operation (i.e., “TOIL”).
  • Participate in operations support and on‑call rotation shifts, for SRE supported systems and products.
  • Participate in or lead problem management activities
    , including post‑mortem incident analysis, and provision of technical insight, documented findings, outcomes and recommendations as part of a root cause analysis to troubleshoot priority incidents.
  • Implement automation to reduce probability and/or impact of problems recurring [possible options could include automated incident response, enhanced monitoring, observability initiatives, automation to change and release management].
  • Identify, evaluate, and recommend monitoring and observability tools and diagnostic techniques to improve system observability and insights, including identification…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary