Production Engineer Job London area,Greater London England UK,IT/Tech

Location: Greater London

Our client is a globally recognised Hedge Fund operating at a significant scale across multi-asset classes. Technology is not a support function here; it’s core to performance.

With a sophisticated, latency-sensitive trading environment and a culture that prizes engineering excellence, they are investing heavily in modernising production reliability. As complexity increases, so too does the importance of operational judgement, system design discipline, and resilient architecture.

They are now hiring a Production Engineer to join a high-calibre global team responsible for the reliability and performance of business-critical systems.

What You’ll Get

Direct exposure to highly time-sensitive, revenue-impacting systems
A seat at the table with engineers, quants and trading stakeholders
The opportunity to define and raise observability standards firm-wide
Scope to build tooling, automation and internal platforms that reduce operational toil
Real ownership of incidents, root cause analysis and long-term systemic improvements
A collaborative, high-accountability culture that rewards initiative and technical depth
Competitive compensation, discretionary bonus and a strong benefits package

This is not a ticket-closing support role. It’s engineering-led reliability at scale.

What You’ll Do

Own the reliability of trading‑critical systems from design through to production stability
Lead high‑severity incident response with calm technical authority
Drive post‑incident reviews that result in meaningful systemic change
Define and implement consistent observability standards (metrics, traces, logging)
Improve release safety and operational excellence through automation and tooling
Write production‑grade code (Python preferred; other modern languages welcomed)
Partner closely with development and trading teams to prevent issues before they surface
Break down ambiguous reliability challenges into practical, incremental deliverables

What You’ll Need

Experience operating and debugging distributed systems in high‑availability environments
Strong grounding in SRE principles (SLIs/SLOs, observability, incident leadership)
High proficiency in at least one modern programming language (Python strongly preferred)
Familiarity with modern observability ecosystems (e.g. Open Telemetry‑style tooling stacks)
The ability to communicate clearly under pressure with both engineers and non‑technical stakeholders
A mindset built on ownership, accountability and continuous improvement

If you are a reliability‑focused engineer who enjoys solving hard production problems in environments where correctness and latency genuinely matter, this opportunity warrants a confidential conversation.

#J-18808-Ljbffr


Increase/decrease your Search Radius (miles)



Job Posting Language