Site Reliability Engineer; SRE
Job in
Mount Pleasant, London, Greater London, EC1A, England, UK
Listed on 2026-01-08
Listing for:
Charles Simon Associates Ltd
Full Time
position Listed on 2026-01-08
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability, Network Engineer
Job Description & How to Apply Below
Location: Mount Pleasant
Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, Power Shell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote
Charles Simon Associates are currently recruiting for an SRE Engineer on a permanent basis. This role is for a global business with a HQ in the City of London.
Candidates will need to be British Citizens due to Security Clearance requirements.
Location:
Remote, with some travel to London
Salary:
Up to £125,000 per annum
Skills/Requirements for the Site Reliability Engineer:
* Extensive SRE experience within previous roles
* Strong Terraform skills
* Proven Kubernetes and AKS experience
* Experience in creating and modifying terraform deployment on live environments
* Experience with Monitoring solutions ideally Datadog, however Azure Application Insight, Log Analytics or Grafana
* Scripting skills for automation within;
Power Shell, Python or Bash
* Experience with web based applications
Desirable
Skills:
* Knowledge or commercial experience of Microservices Architecture
* Kanban
* Any prior experience of working with Puppet and Chef would be advantageous
Start date is ASAP for the Site Reliability Engineer
The Site Reliability Engineer will be responsible for:
* Designing and enforcing service-level objectives (SLOs), SLIs, and SLAs to ensure reliability targets are measurable and aligned with business expectations
* Implementing incident response frameworks, including runbooks, postmortems, and blameless RCA processes to drive continuous improvement
* Integrating observability tooling (e.g. Prometheus, Grafana, Datadog, Open Telemetry) to enable proactive detection and resolution of system anomalies
* Managing infrastructure as code (IaC) using tools like Terraform, Pulumi, or Cloud Formation to ensure repeatable, auditable deployments
* Optimizing cost and resource utilization across cloud environments through rightsizing, autoscaling, and lifecycle policies
* Driving chaos engineering initiatives to test system resilience under failure conditions and validate recovery strategies
* Championing security best practices within infrastructure—e.g. secrets management, IAM policies, and vulnerability scanning
* Collaborating with Dev Ops and platform teams to build paved-road deployment patterns and internal developer portals
* Leading capacity planning and load testing efforts to anticipate scaling needs and prevent bottlenecks
* Contributing to architectural decisions that impact reliability, latency, and fault domains across distributed systems
Please send an up-to-date copy of your CV to be considered for the Site Reliability Engineer
Site Reliability Engineer – (SRE, Site Reliability Engineer, Terraform, AKS, Azure, Kubernetes, Power Shell, Python, Bash, Datadog, Monitoring Tools) – Permanent – Remote
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×