Lead Site Reliability Engineer Global IT
Listed on 2026-06-04
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, IT Project Manager
Location: Greater London
We are a UK fintech creating successful neobanks in emerging markets in partnerships with local traditional banks. The mission is to make banking services accessible, simple and fun to use worldwide and the goal is to launch neobanks in 50+ markets, serving 100m+ customers.
Our success builds upon a best-in-class product, customer experience, emotional engagement, viral marketing and deep credit-decisioning expertise across our product suite covering credit, payments, savings and investments. One of our founders also previously co-founded a highly successful Eastern European neobank with a multi-million customer base.
We launched our first market with Leobank in Azerbaijan in 2021, where we’ve already taken a leading market position. Our next market was Vietnam, where we launched Liobank in early 2023 and have also reached strong traction. We have several more markets on the roadmap in the next 12 months and are starting to build out teams there.
Why Fintech Farm is a Great Place to BeFintech Farm is a leading fintech with a clear mission and expansion goals. We are committed to delivering innovative banking solutions worldwide.
Our AmbitionWe are looking to become a leading consumer digital bank brand in each market we operate, making it easy for consumers to interact with their money. You could be a part of this exciting journey.
Our CultureCustomers. We always go above and beyond to provide an amazing customer experience. We serve our customers the way we would want our mom to be served. And who said that banking has to be boring? We make our apps not just easy but fun to use.
People. We are all business partners in our company. Each of us thinks big, acts as if we own the place and never takes ‘no’ for an answer. We work with strong individuals whom we empower and trust rather than micromanage. Common sense rather than formal policies prevails in all that we do. We always stay curious and open-minded. We embrace the ‘we over me’ culture.
YourRole
As a Lead SRE, you will drive the reliability, scalability, and performance of our multi-market microservices infrastructure. You’ll lead a team of engineers focused on automating operations, improving observability, and ensuring zero-downtime service delivery across our cloud and on-prem environments. Your mission is to build resilient systems and empower development teams with the tools and practices needed to operate safely and efficiently at scale.
WhatYou Will Be Doing
- Build and define theSRE function, establishing best practices for reliability, observability, and incident management across the platform
- Manage and optimize
Kubernetes clusters(AWS EKS and on-prem), ensuring scalability, cost efficiency, and resilience - Overseeobservability and alerting stack— including Prometheus, Grafana, Alert manager, ELK, Victoria Metrics
- Implement and refine monitoring and alerting strategies, establishing actionable SLIs/SLOs and effective on-call processes
- Drive improvements ininfrastructure as codeusing Terraform/Terragrunt
- Collaborate closely with software and Dev Ops teamsto ensure production readiness and reliable CI/CD delivery pipelines
- Participate in and enhanceincident management processes, including post-mortems and continuous improvement initiatives
- Lead efforts insecurity hardening, compliance, and cost optimizationacross environments
- Contribute to strategic planning of infrastructure roadmap and technology evolution
- A leader who takes ownership and inspires reliability-focused culture
- Obsessed with system stability, scalability, and measurable performance
- Strong communicator who can translate technical concepts into clear direction
- Calm under pressure, analytical in incident response, and proactive in prevention
- Passionate about mentoring engineers and driving operational excellence
- 6+ years in Dev Ops/SRE roles, with at least 2 years in a technical leadership position
- Deep expertise in Kubernetes(EKS and on-prem),Prometheus,Grafana, and alerting systems
- Strong background inAWSandInfrastructure as Code(Terraform/Terragrunt)
- Experience designing and maintaining
CI/CD pipelines(Git Lab CI/CD or Git Hub Actions) - Proficiency in scripting languages (Python, Bash) and automation tooling (Ansible, Helm)
- Familiar with
GitOps principles(Flux, ArgoCD) - Solid understanding of networking, security, and observabilitypractices
- Proven ability to lead incident response and drive cross-functional reliability improvements
- Exposure toDev Sec Ops standards, compliance, and audit processes (ISO 27001, SOC 2, PCI DSS)
- Competitive salary (negotiable based on seniority and leadership scope)
- Share options
- Opportunity to shape theSRE function in a fast-scaling fintech start-up
- A collaborative environment that valuesautonomy, innovation, and impact
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: