Site Reliability Engineer

Job in Zürich, 8058, Zurich, Kanton Zürich, Switzerland

Listing for: Selby Jennings

Full Time position
Listed on 2026-06-03

Job specializations:

IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing: Infrastructure & Operations, IT Support

Salary/Wage Range or Industry Benchmark: 100000 - 125000 CHF Yearly CHF 100000.00 125000.00 YEAR

Location: Zürich

Overview

Our client, a leading proprietary trading firm specialising in both systematic and discretionary strategies, is seeking a Site Reliability Engineer to join their Zurich office. This is a unique opportunity to evolve and enhance a highly sophisticated production trading environment, ensuring exceptional uptime and performance. The role focuses on delivering code-driven solutions while partnering closely with developers and traders to strengthen reliability, observability, and overall operational maturity within a low-latency, high-performance ecosystem.

The ideal candidate will bring deep experience supporting highly available, performance-critical, latency-sensitive systems, alongside a strong understanding of Linux internals and networking. A solid background in reliability engineering is essential, with a clear automation-first mindset and hands-on experience with containerisation technologies.

Responsibilities

Reliability & Production Ownership:
Own availability, stability, and performance of Linux-based trading systems (Red Hat, Rocky, Ubuntu).
Incident Response:
Lead incident management, on-call, and blameless post-mortems, driving automation to prevent recurrence.
Operational Processes:
Maintain runbooks, documentation, and standards for consistent production support.
Production Readiness:
Partner with developers and traders to ensure reliable, high-performance system design and deployment.
Linux Systems & Performance:
Perform low-level tuning (CPU, IRQ, memory, networking) for latency-sensitive workloads.
Performance Diagnostics:
Troubleshoot using perf, ftrace, tcpdump, and eBPF.
Automation & Infrastructure:
Deliver infrastructure as code with Ansible, Terraform, Python, and shell scripting.

Required Qualifications

Experience in Site Reliability Engineering, Linux engineering, Dev Ops, or infrastructure-focused roles.
Production Systems:
Proven experience supporting highly available, performance-sensitive production environments.
Linux Expertise:
Deep knowledge of Linux internals, including scheduling, memory management, interrupts, file systems, and storage.
Networking:
Strong understanding of TCP/IP, UDP, multicast, and distributed systems networking.
Automation & Tooling:
Proficiency with Ansible, Terraform, Python, shell scripting, YAML/JSON, and Git-based workflows.
Containers & Observability:
Experience with Docker (or similar) and familiarity with observability tools such as Prometheus, Grafana, ELK, or equivalent.

#J-18808-Ljbffr