Site Reliability Engineer Job London area,Greater London England UK,IT/Tech

Location: Greater London

Overview

At Future Learn, we’re passionate about the power of lifelong learning. We help learners from all over the world progress in their careers – and invest in their futures.

We truly believe that up-skilling is a worthy investment, and we hope to empower our learners to take control of their careers through personalised learning pathways – giving them progress at their fingertips.

Partnering with 260+ world-class educational partners, including prestigious universities, global brands and industry partners, we offer our 20 million-strong learner community the opportunity to discover and access flexible, high-quality online courses and degrees.

We’re not here just to teach new skills (although we do that well), we want to help transform lives. Future Learn is looking to build our teams with people who share our passion for lifelong learning, career empowerment and education for all. If that sounds like you, get in touch. You could help us achieve our biggest goal yet – becoming the world’s best AI-powered, career-based learning platform and OPM.

What

is the opportunity?

You will play a key role in maintaining and evolving Future Learn’s platform to ensure it is highly available, reliable, secure, and scalable as the business grows. Working closely with the Lead Technical Architect, SREs, and software engineers, you’ll help shape the technical direction of our infrastructure while fostering a strong Dev Ops culture that enables teams to deliver high-quality services safely and efficiently.

We’re looking for people who are curious, thoughtful, and eager to learn, with a genuine desire to use their experience to support and enable others. You’ll need to communicate clearly, work effectively in a collaborative environment, and be comfortable operating autonomously when needed.

What does success look like

Maintaining platform availability and reliability

Partner with the Lead Technical Architect to set and evolve the technical direction of our infrastructure, ensuring it scales to support business growth in a cost-effective manner.
Take responsibility for a platform that is secure, resilient, scalable, and cost-efficient.
Develop deep expertise in Future Learn’s technology stack and its practical application, including AWS (RDS, ECS, EC2, S3, Lambda), Cloudflare, Redis, DNS, Docker, and the wider infrastructure platform.
Use, maintain, and continuously improve observability tooling such as Datadog and AWS Cloud Watch to monitor platform health, troubleshoot performance issues, and identify root causes.
Respond to incidents affecting the platform, including participation in the on-call rota.
Ensure disaster recovery and incident response processes are regularly tested and improved, designing exercises informed by industry best practices such as gamedays and chaos engineering.
Act as an expert in the tools used to manage infrastructure and CI/CD systems, including Terraform, Git Hub Actions, and scripting languages.
Building a Dev Ops culture at Future Learn
Own and continuously improve the developer experience, supporting SREs in refining how the Future Learn application is developed, tested, and deployed so it is safer, faster, and easier to work on.
Champion CI/CD best practices, enabling engineers to reliably deliver high-quality services to production.
Empower software engineers to understand how to get their code into production and how to identify and debug performance issues.
Support engineers through pairing, teaching, mentoring, coaching, and code reviews, demonstrating the practices of an effective engineer.
Act as a subject matter expert for infrastructure and operational concerns across Future Learn.

What you bring to the table

Essential experience and skills
Experience architecting and supporting cloud-native web application infrastructure.
Hands-on experience with containers and schedulers (Amazon ECS).
Experience using automated configuration management and infrastructure-as-code tools (Terraform).
A deep understanding of Linux, networking, and security.
Experience supporting database administration and performance, with a focus on scalability and maintainability.
A strong interest in automation and…


Increase/decrease your Search Radius (miles)



Job Posting Language