×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer, Associate

Job in City of Edinburgh, Edinburgh, City of Edinburgh Area, EH1, Scotland, UK
Listing for: LGBT Great
Full Time position
Listed on 2026-02-05
Job specializations:
  • IT/Tech
    SRE/Site Reliability, Cloud Computing, Systems Engineer, IT Support
Salary/Wage Range or Industry Benchmark: 100000 - 125000 GBP Yearly GBP 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Location: City of Edinburgh

About this role

We’re looking for an SRE with strong Kafka experience and a deep understanding of SRE best practices. You’ll combine handson technical improvements with the ability to delegate work effectively to Event Bus developers.

You’ll collaborate closely with the Event Bus, Kafka, Telemetry, and Incident Response teams, while also working independently to improve monitoring, reduce noise, strengthen alerting, and track remediation progress.

This role sits at the centre of a global platform used by hundreds of developers and joins a fast-growing, experienced SRE group based in Edinburgh.

The Team

The Aladdin Event Bus is built on Kafka and enables teams to publish and subscribe to distributed events in near real time. As part of the Aladdin Graph group—a core Platform Engineering function—the Event Bus team supports developers across the firm in designing, building, and operating event driven and APIbased systems.

Event Bus is now a critical dependency for key applications, including our release system and API infrastructure. This drives a high bar for availability, incident responsiveness, and operational excellence. The SRE function supports this by improving observability, streamlining incident processes, and identifying gaps that meaningfully improve platform reliability.

Key Responsibilities

As the SRE for Event Bus, you will drive stability, resiliency, and observability through:

  • Staying informed on all Event Bus incidents, including impact, root cause, detection, and ongoing remediation
  • Responding to incidents calmly and efficiently, communicating clearly with reporters and partner teams, and recommending remediations based on urgency and impact
  • Proposing improvements informed by prior incidents, potential risks, and industry standards—e.g., new metrics, SLOs, fallback mechanisms
  • Leading incident retrospectives and sharing insights with the wider team
  • Creating and distributing postmortems for high impact operational events
  • Collaborating with developers to write, maintain, and promote runbooks and playbooks
  • Improving alert quality and reducing alert fatigue by tuning signaltonoise ratios
  • Designing and implementing automated recovery solutions for known issues
  • Building a roadmap toward 24/7 availability, rapid failover recovery, self detection, and automated resolution of common issues
  • Helping Event Bus users diagnose issues with their own producers and consumers
Requirements
  • 3+ years in an SRE role, including experience with defining and managing SLOs
  • Strong understanding of SRE principles (Golden Signals, error budgets, synthetic monitoring, signaltonoise optimisation)
  • Extensive handson experience with Kafka
  • Experience using monitoring tools (Grafana and Splunk preferred), including building dashboards, alerts, and reports
Suggested Requirements
  • Java Developer

    Experience:

    Experience with Java or another object oriented language
  • CI/CD & Release Management:
    Experience managing pipelines using Azure Dev Ops or other Gitbased tools
  • Cloud

    Experience:

    Practical experience with at least one public cloud provider, preferably Azure or AWS
  • Agile Development:
    Familiarity with agile ways of working, sprint ceremonies, and backlog planning
  • Scripting & Automation:
    Proficiency in Python or Golang for automating operational tasks
  • Monitoring & Observability:
    Strong understanding of logging, monitoring, and observability practices, including writing integration scripts
  • Collaboration & Communication:
    Strong crossteam collaboration skills and excellent written and verbal communication
Our benefits

To help you stay energized, engaged and inspired, we offer a wide range of employee benefits including: retirement investment and tools designed to help you in building a sound financial future; access to education reimbursement; comprehensive resources to support your physical health and emotional well-being; family support programs; and Flexible Time Off (FTO) so you can relax, recharge and be there for the people you care about.

Our

hybrid work model

Black Rock’s hybrid work model is designed to enable a culture of collaboration and apprenticeship that enriches the experience of our employees, while supporting flexibility for all. Employees…

Position Requirements
10+ Years work experience
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary