Senior Site Reliability Engineer
Listed on 2026-01-25
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, IT Support
We are working with a leading multi-strategy Hedge Fund where engineering plays a critical role in the core business rather than operating as a support function. Technical teams are given real ownership, work on challenging and meaningful problems, and collaborate closely with end users, ensuring their work has a clear impact.
They are seeking a Senior SRE to join their London HQ to work hands-on with cloud and on-prem platforms. They require someone to supercharge system reliability and elevate performance across every part of their trading infrastructure
Why You ll Get- Joining a group that focuses on modern platforms, high engineering standards, and rewarding strong performance.
- It offers an environment where technologists can grow, innovate, and see the results of their contributions.
- Long-term career progression as the firm continues to grow.
- The chance to cultivate their SRE philosophy, processes, and technologies from the ground up.
- Drive standards and foster adoption within your core team, whilst closely partnering with our Dev Ops and Cloud teams.
- An opportunity to be instrumental in evolving our operations and boosting performance across diverse systems and platforms.
- Define and embed SRE principles, creating processes and standards that support scalable and reliable infrastructure.
- Design and maintain comprehensive monitoring and observability using Prometheus, Grafana, Loki, and Tempo to ensure clear insights into system and application performance.
- Participate in the team’s on-call rotation, sharing responsibility for approximately one week per month.
- Set and maintain reliability requirements for applications running in Kubernetes
, balancing performance, cost efficiency, and system resilience. - Develop tools and automation to improve deployment pipelines, system health checks, and recovery procedures.
- Work closely with development teams to improve service stability, scalability, and fault tolerance, applying best practices such as SLOs and blameless post-mortems
.
- 5+ years in SRE or similar roles with complex, distributed systems
- Degree in engineering, computer science, or equivalent experience
- Expert in Prometheus, Grafana, Loki, Tempo (OTEL) and observability tooling
- Skilled with Kubernetes
, Docker
, and containerised environments - Hands-on with cloud (AWS preferred) and on-prem infrastructure
- Proficient in Python, Bash, or Go for automation and pipelines
- Solid grasp of CI/CD, Dev Ops, and agile workflows
- Self-starter with a passion for reliability and operational excellence
- Strong communicator, able to translate technical concepts across teams
Bonus
Skills:
Experience with
databases (Postgre
SQL, Redis, Snowflake),
messaging systems (Kafka, Solace), or workflow orchestration (Airflow)
If you’re a motivated Senior SRE looking to step into an ideas-driven, high-performance environment
, we’d love to hear from you.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: