Senior Site Reliability Engineer Job London area,Greater London England UK,IT/Tech

Location: Greater London

We are working with a leading multi-strategy Hedge Fund where engineering plays a critical role in the core business rather than operating as a support function. Technical teams are given real ownership, work on challenging and meaningful problems, and collaborate closely with end users, ensuring their work has a clear impact.

They are seeking a Senior SRE to join their London HQ to work hands-on with cloud and on-prem platforms. They require someone to supercharge system reliability and elevate performance across every part of their trading infrastructure

Why You ll Get

Joining a group that focuses on modern platforms, high engineering standards, and rewarding strong performance.
It offers an environment where technologists can grow, innovate, and see the results of their contributions.
Long-term career progression as the firm continues to grow.
The chance to cultivate their SRE philosophy, processes, and technologies from the ground up.
Drive standards and foster adoption within your core team, whilst closely partnering with our Dev Ops and Cloud teams.
An opportunity to be instrumental in evolving our operations and boosting performance across diverse systems and platforms.

What You ll Do

Define and embed SRE principles, creating processes and standards that support scalable and reliable infrastructure.
Design and maintain comprehensive monitoring and observability using Prometheus, Grafana, Loki, and Tempo to ensure clear insights into system and application performance.
Participate in the team’s on-call rotation, sharing responsibility for approximately one week per month.
Set and maintain reliability requirements for applications running in Kubernetes
, balancing performance, cost efficiency, and system resilience.
Develop tools and automation to improve deployment pipelines, system health checks, and recovery procedures.
Work closely with development teams to improve service stability, scalability, and fault tolerance, applying best practices such as SLOs and blameless post-mortems
.

What You ll Need

5+ years in SRE or similar roles with complex, distributed systems
Degree in engineering, computer science, or equivalent experience
Expert in Prometheus, Grafana, Loki, Tempo (OTEL) and observability tooling
Skilled with Kubernetes
, Docker
, and containerised environments
Hands-on with cloud (AWS preferred) and on-prem infrastructure
Proficient in Python, Bash, or Go for automation and pipelines
Solid grasp of CI/CD, Dev Ops, and agile workflows
Self-starter with a passion for reliability and operational excellence
Strong communicator, able to translate technical concepts across teams

Bonus

Skills:

Experience with
databases (Postgre

SQL, Redis, Snowflake),
messaging systems (Kafka, Solace), or workflow orchestration (Airflow)

If you’re a motivated Senior SRE looking to step into an ideas-driven, high-performance environment
, we’d love to hear from you.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language