Site Reliability Engineer
Listed on 2026-06-03
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, IT Support
Overview
Our client, a leading proprietary trading firm specialising in both systematic and discretionary strategies, is seeking a Site Reliability Engineer to join their Zurich office. This is a unique opportunity to evolve and enhance a highly sophisticated production trading environment, ensuring exceptional uptime and performance. The role focuses on delivering code-driven solutions while partnering closely with developers and traders to strengthen reliability, observability, and overall operational maturity within a low-latency, high-performance ecosystem.
The ideal candidate will bring deep experience supporting highly available, performance-critical, latency-sensitive systems, alongside a strong understanding of Linux internals and networking. A solid background in reliability engineering is essential, with a clear automation-first mindset and hands-on experience with containerisation technologies.
Responsibilities- Reliability & Production Ownership:
Own availability, stability, and performance of Linux-based trading systems (Red Hat, Rocky, Ubuntu). - Incident Response:
Lead incident management, on-call, and blameless post-mortems, driving automation to prevent recurrence. - Operational Processes:
Maintain runbooks, documentation, and standards for consistent production support. - Production Readiness:
Partner with developers and traders to ensure reliable, high-performance system design and deployment. - Linux Systems & Performance:
Perform low-level tuning (CPU, IRQ, memory, networking) for latency-sensitive workloads. - Performance Diagnostics:
Troubleshoot using perf, ftrace, tcpdump, and eBPF. - Automation & Infrastructure:
Deliver infrastructure as code with Ansible, Terraform, Python, and shell scripting.
- Experience in Site Reliability Engineering, Linux engineering, Dev Ops, or infrastructure-focused roles.
- Production Systems:
Proven experience supporting highly available, performance-sensitive production environments. - Linux Expertise:
Deep knowledge of Linux internals, including scheduling, memory management, interrupts, file systems, and storage. - Networking:
Strong understanding of TCP/IP, UDP, multicast, and distributed systems networking. - Automation & Tooling:
Proficiency with Ansible, Terraform, Python, shell scripting, YAML/JSON, and Git-based workflows. - Containers & Observability:
Experience with Docker (or similar) and familiarity with observability tools such as Prometheus, Grafana, ELK, or equivalent.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: