More jobs:
Job Description & How to Apply Below
Key Responsibilities
Lead the reliability engineering function across trading infrastructure and production platforms.
Architect and operate highly available, fault-tolerant distributed systems supporting live trading environments.
Own infrastructure reliability, observability, scalability, deployment safety, and operational excellence across mission-critical systems.
Drive platform engineering initiatives across Kubernetes, CI/CD, infrastructure automation, runtime orchestration, and developer tooling.
Partner closely with trading, quant, and backend engineering teams to optimize latency, throughput, resiliency, and production stability.
Build and standardize monitoring, alerting, tracing, logging, failover testing, disaster recovery, and incident response frameworks.
Lead root cause analysis and resolution for complex production and distributed systems issues.
Strengthen infrastructure security, auditability, secrets management, and operational governance across trading environments.
Improve engineering productivity through automation, internal tooling, and infrastructure self-service capabilities.
Define operational best practices, reliability standards, release governance, and infrastructure lifecycle management processes.
Mentor and help scale the future reliability and platform engineering organization.
Required Experience
7–12 years of experience in Infrastructure Engineering, Reliability Engineering, SRE, Platform Engineering, or Distributed Systems environments.
Strong experience operating mission-critical production systems in high-availability environments.
Deep expertise in Linux systems, networking, and distributed infrastructure architecture.
Strong hands-on experience with Kubernetes and containerized production environments.
Strong programming ability in Go or Python.
Experience with Kafka, Terraform, Vault, Consul, CI/CD pipelines, and infrastructure automation frameworks.
Strong understanding of observability platforms including Prometheus, Alert manager, logging, and tracing systems.
Proven expertise debugging complex distributed systems and low-latency production environments.
Experience in trading systems, fintech, exchanges, HFT firms, or other real-time infrastructure environments is highly preferred.
Strong ownership mindset with the ability to operate in high-performance engineering environments
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×