×
Register Here to Apply for Jobs or Post Jobs. X

Lead Site Reliability Engineer

Job in Belfast, County Antrim, BT1, Northern Ireland, UK
Listing for: Realtime Recruitment
Full Time position
Listed on 2026-02-17
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Support
Job Description & How to Apply Below

My Belfast based client looking for a Site Reliability Engineer to lead reliability strategy for large-scale, production-critical systems. This is a senior, hands-on role where you’ll influence architecture, improve resilience, and drive reliability improvements across multiple engineering teams. You’ll thrive here if you enjoy operating complex production environments, shaping technical direction, and mentoring engineers while staying close to the technology.

As the Lead SRE, you’ll provide technical leadership for product reliability, observability, and operational excellence. You’ll define reliability standards, shape the roadmap, and deliver high-impact improvements across cloud-based platforms.

About the role:

  • Lead product reliability strategy, defining and owning the Reliability Roadmap
  • Design and implement SLIs, SLOs, and error budgets aligned to customer experience
  • Drive observability and monitoring using metrics, logs, and distributed tracing
  • Partner with product and engineering leads on reliability, performance, capacity, and DR testing
  • Act as a senior escalation point during on-call and major incident response
  • Lead post-incident reviews and prioritise both short-term fixes and long-term improvements
  • Reduce operational toil through automation and shift-left practices
  • Lead production reviews using SLOs, incident, and reliability data
  • Represent SRE in architecture and design decisions, prioritising resilience and scalability
  • Mentor engineers and champion SRE best practices
  • Support the migration of applications to Google Cloud Platform (GCP)
  • Optimise capacity, performance, and cost without impacting reliability
  • Build proofs of concept (POCs) that can be reused across teams

Requirements:

  • 5+ years Hands-on experience in a Site Reliability Engineering role
  • Strong knowledge of Linux-based systems and distributed system architectures
  • Experience with cloud platforms, ideally Google Cloud Platform (GCP), GCE, and/or GKE
  • Strong programming and automation skills (Python, Bash, Terraform, Ansible; Java a plus)
  • Experience with CI/CD automation and modern delivery pipelines
  • Deep familiarity with observability and monitoring tools (Prometheus, Grafana, Splunk, or similar)
  • Solid understanding of networking fundamentals and messaging middleware
  • Proven ability to influence technical direction and lead cross-team initiatives
  • Excellent communication and stakeholder management skills

Highly Desirable:

  • Experience building or supporting financial systems or trading platforms
  • Exposure to ultra-low latency (ULL) environments
  • Experience working in Agile teams

Apply now or email your CV to shane.doolins

Must be Belfast based with full working rights for Northern Ireland

#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary