Cloud SRE & DevOps Engineer
Listed on 2026-06-11
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, SRE/Site Reliability
Soros Fund Management LLC (SFM) is a global asset manager and family office founded by George Soros in 1970. With $28 billion in assets under management (AUM), SFM serves as the principal asset manager for the Open Society Foundations, one of the world’s largest charitable foundations dedicated to advancing justice, human rights, and democracy.
Team OverviewReports To:
Head of Cloud SRE & Dev Ops Engineering
Other Key Relationships:
Cybersecurity analysts, Software Development engineers
We are seeking a mid‑to‑senior level engineer to join our Cloud SRE & Dev Ops Engineering team in London, focused on building, operating, and evolving the cloud infrastructure and delivery platforms that support our trading and investment systems.
This role sits at the intersection of Cloud Engineering, SRE, Dev Ops, Platform Engineering, and Production Engineering, and is designed for individuals who take ownership of systems running in production. You will be responsible for designing and operating resilient, scalable environments across AWS and Kubernetes, while enabling engineering teams through modern Dev Ops and Git Ops practices.
The position reflects a strong Site Reliability Engineering (SRE) mentality, with emphasis on reliability, observability, automation, and operational excellence. You will play a key role in advancing the firm’s cloud transformation strategy, ensuring systems are built for performance, stability, and scale.
This is a hands‑on role requiring deep technical expertise, accountability for production systems, and a mindset oriented toward continuous improvement, risk management, and engineering efficiency. The role also contributes to evolving platform capabilities supporting AI and data‑driven workloads.
You will work closely with software engineering, cybersecurity, and data teams to deliver secure, scalable, and high‑performing systems, while improving developer experience and platform maturity across the organization.
Major ResponsibilitiesCloud Infrastructure & Kubernetes
- Design, build, and operate scalable AWS‑based infrastructure supporting trading systems
- Work across hybrid or multi‑cloud environments (AWS and Azure)
- Manage and optimize Kubernetes environments (EKS), ensuring resilience, scalability, and performance
- Utilize Kubernetes ecosystem tooling (e.g., Helm) to support application deployment and lifecycle management
- Develop reusable infrastructure using Terraform and infrastructure as code principles
Database Cloud Management & Administration
- Manage the automated delivery of Snowflake configurations through CICD pipelines
- Help administration of Microsoft SQL Server, Snowflake and AWS Aurora
CI/CD & Dev Ops Practices
- Design and maintain CI/CD pipelines using tools such as Git Hub Actions
- Promote Git‑based workflows and support Git Ops practices (e.g., ArgoCD)
- Improve deployment reliability, consistency, and engineering velocity
Production Engineering & Reliability
- Own and operate business‑critical production systems with a strong focus on uptime, performance, and risk mitigation
- Troubleshoot and resolve complex issues across distributed systems, cloud infrastructure, and Kubernetes environments
- Implement and enhance monitoring, logging, and alerting using tools such as Datadog, AWS Cloud Watch, Geneos, and Logic Monitor
- Apply cloud security best practices, including IAM, secrets management, and vulnerability scanning
- Support and optimize relational database systems (e.g., Postgre
SQL, MySQL, SQL Server, Aurora), ensuring performance and high availability - Contribute to backup, recovery, and resilience strategies across infrastructure and data layers
- Drive improvements aligned with SRE principles, including reliability, observability, and operational maturity
Developer Enablement
- Partner with engineering teams to streamline development and deployment workflows
AI & Emerging Workloads
- Support infrastructure for AI/ML and data‑driven workloads, including scalable compute and data processing patterns
- Enable deployment patterns for modern, data‑intensive applications
- Evaluate emerging technologies relevant to AI‑enabled platforms
Core Technical Expertise
- Strong hands‑on…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: