Site Reliability Engineer; SRE Job London area,Mount Pleasant England UK,IT/Tech

Position: Site Reliability Engineer (SRE)
Location: Mount Pleasant

Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, Power Shell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote

Charles Simon Associates are currently recruiting for an SRE Engineer on a permanent basis. This role is for a global business with a HQ in the City of London.

Candidates will need to be British Citizens due to Security Clearance requirements.

Location:

Remote, with some travel to London

Salary:
Up to £125,000 per annum

Skills/Requirements for the Site Reliability Engineer:

* Extensive SRE experience within previous roles

* Strong Terraform skills

* Proven Kubernetes and AKS experience

* Experience in creating and modifying terraform deployment on live environments

* Experience with Monitoring solutions ideally Datadog, however Azure Application Insight, Log Analytics or Grafana

* Scripting skills for automation within;
Power Shell, Python or Bash

* Experience with web based applications

Desirable

Skills:

* Knowledge or commercial experience of Microservices Architecture

* Kanban

* Any prior experience of working with Puppet and Chef would be advantageous

Start date is ASAP for the Site Reliability Engineer

The Site Reliability Engineer will be responsible for:

* Designing and enforcing service-level objectives (SLOs), SLIs, and SLAs to ensure reliability targets are measurable and aligned with business expectations

* Implementing incident response frameworks, including runbooks, postmortems, and blameless RCA processes to drive continuous improvement

* Integrating observability tooling (e.g. Prometheus, Grafana, Datadog, Open Telemetry) to enable proactive detection and resolution of system anomalies

* Managing infrastructure as code (IaC) using tools like Terraform, Pulumi, or Cloud Formation to ensure repeatable, auditable deployments

* Optimizing cost and resource utilization across cloud environments through rightsizing, autoscaling, and lifecycle policies

* Driving chaos engineering initiatives to test system resilience under failure conditions and validate recovery strategies

* Championing security best practices within infrastructure—e.g. secrets management, IAM policies, and vulnerability scanning

* Collaborating with Dev Ops and platform teams to build paved-road deployment patterns and internal developer portals

* Leading capacity planning and load testing efforts to anticipate scaling needs and prevent bottlenecks

* Contributing to architectural decisions that impact reliability, latency, and fault domains across distributed systems

Please send an up-to-date copy of your CV to be considered for the Site Reliability Engineer

Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, Power Shell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote


Increase/decrease your Search Radius (miles)



Job Posting Language