Senior Software Engineer/Reliability Engineering - Data Job London area,Greater London England UK,IT/Tech

Position: Senior Software Engineer / Reliability Engineering - Real-time Data
Location: Greater London

Description & Requirements

Our department is responsible for efficiently distributing financial data from its source to interested users all around the world. This includes (for example) stock prices or foreign exchange rates. Data can either be served in response to a request or streamed in real time.

Location: London

Business Area: Engineering and CTO

The Group Owns

The distribution software and infrastructure
A range of different sources of data
Supporting services to administer and manage the system, including permissioning and metering

The team is also responsible for the Enterprise endpoint (“B-PIPE”), which allows end‑users to programmatically consume data via our SDK. Data is also available through the Bloomberg Terminal and Microsoft Excel.

The main challenge faced by the group is one of scale. Data is sourced from more than 370 global exchanges, with a combined volume in excess of 60 billion messages each day. We deliver this data to hundreds of thousands of terminals and thousands of B‑PIPEs. Handling this volume requires significant infrastructure; we manage multiple clusters in our main data centres, as well as a network of many thousands of servers around the world.

Group

Overview

The RD Reliability Engineering group comprises three sub‑teams located in Tokyo, London, and New York, providing follow‑the‑sun support.

Our mission is to ensure systems are reliable, scalable, and observable through software engineering, while continuously improving how systems behave under load and failure conditions. We work in an outcome‑driven model, focusing on measurable improvements in availability, latency, capacity, and recovery. Our goal is to ensure systems meet defined service level objectives while minimising manual operational effort through automation and software solutions.

London

Team Focus – Availability & Resiliency

The London team plays a key role in ensuring the availability and resiliency of RD infrastructure globally.

We Focus On

Detecting and preventing failures across large‑scale distributed systems
Ensuring infrastructure demonstrates sufficient capacity and failover capability during site‑loss scenarios
Reducing time to detect, diagnose, and recover from incidents
Ensuring systems behave predictably under both normal and adverse conditions

What You’ll Do

Build and maintain production‑grade software supporting Bloomberg’s global distribution infrastructure
Design and implement scalable, fault‑tolerant systems with a focus on observability, performance, and automation
Analyse system behaviour under real‑world and failure scenarios to validate capacity, failover, and recovery meet resilience objectives
Identify bottlenecks, scaling limits, and reliability risks across distributed systems
Improve detection, diagnosis, and prevention of production issues
Build tools and frameworks to increase system visibility and reduce time to detect and resolve incidents
Automate operational workflows to reduce manual effort and improve system reliability
Partner with application and infrastructure teams to improve system design, resilience, and performance
Contribute to design discussions, incident reviews, and reliability improvements across the platform

Systems You’ll Work With

Configuration systems serving thousands of servers across the global network
Service discovery and clustering systems for distributed infrastructure
Monitoring and observability frameworks for large‑scale server estates
Tooling for diagnosing data quality and distribution issues
Ownership of systems may evolve over time as the team focuses on areas of highest impact

What Success Looks Like

Systems consistently meet defined reliability, latency, and capacity objectives
Issues are detected and mitigated before significant customer impact
Systems are demonstrably resilient, with proven failover capability and sufficient capacity under failure conditions
Operational processes are automated and scalable
Reliability is achieved through engineering improvements rather than manual intervention

What We’re Looking For

We’re not a traditional SRE team. We engineer reliability through software, building solutions that automate operations and improve system…

Senior Software Engineer​/Reliability Engineering - Data

Senior Software Engineer/Reliability Engineering - Data