Production Engineer
Listed on 2026-02-21
-
IT/Tech
Systems Engineer, IT Support
Our client is a globally recognised Hedge Fund operating at a significant scale across multi-asset classes. Technology is not a support function here; it’s core to performance.
With a sophisticated, latency-sensitive trading environment and a culture that prizes engineering excellence, they are investing heavily in modernising production reliability. As complexity increases, so too does the importance of operational judgement, system design discipline, and resilient architecture.
They are now hiring a Production Engineer to join a high-calibre global team responsible for the reliability and performance of business-critical systems.
What You’ll Get- Direct exposure to highly time-sensitive, revenue-impacting systems
- A seat at the table with engineers, quants and trading stakeholders
- The opportunity to define and raise observability standards firm-wide
- Scope to build tooling, automation and internal platforms that reduce operational toil
- Real ownership of incidents, root cause analysis and long-term systemic improvements
- A collaborative, high-accountability culture that rewards initiative and technical depth
- Competitive compensation, discretionary bonus and a strong benefits package
This is not a ticket-closing support role. It’s engineering-led reliability at scale.
What You’ll Do- Own the reliability of trading‑critical systems from design through to production stability
- Lead high‑severity incident response with calm technical authority
- Drive post‑incident reviews that result in meaningful systemic change
- Define and implement consistent observability standards (metrics, traces, logging)
- Improve release safety and operational excellence through automation and tooling
- Write production‑grade code (Python preferred; other modern languages welcomed)
- Partner closely with development and trading teams to prevent issues before they surface
- Break down ambiguous reliability challenges into practical, incremental deliverables
- Experience operating and debugging distributed systems in high‑availability environments
- Strong grounding in SRE principles (SLIs/SLOs, observability, incident leadership)
- High proficiency in at least one modern programming language (Python strongly preferred)
- Familiarity with modern observability ecosystems (e.g. Open Telemetry‑style tooling stacks)
- The ability to communicate clearly under pressure with both engineers and non‑technical stakeholders
- A mindset built on ownership, accountability and continuous improvement
If you are a reliability‑focused engineer who enjoys solving hard production problems in environments where correctness and latency genuinely matter, this opportunity warrants a confidential conversation.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: