More jobs:
Cloud Engineer
Job in
Glasgow, Glasgow City Area, G1, Scotland, UK
Listed on 2026-02-02
Listing for:
Pyramid Consulting, Inc
Full Time
position Listed on 2026-02-02
Job specializations:
-
IT/Tech
Cloud Computing, SRE/Site Reliability
Job Description & How to Apply Below
Job Title:
AWS Site Reliability Engineer (Data Platform) Role Summary
We are looking for an AWS Site Reliability Engineer (SRE) to support and scale a cloud-native data platform built on AWS, Snowflake, and Databricks
. The role focuses on driving reliability through automation, disaster recovery (DR) testing, resiliency engineering, observability, and proactive SLO/SLI/SLA management
.
- Design, build, and maintain automation for infrastructure provisioning, platform operations, and incident response using IaC and CI/CD.
- Lead resiliency and disaster recovery planning
, including regular DR drills, failure testing, and recovery validation across AWS and data platform components. - Define, implement, and manage SLIs, SLOs, and SLAs for critical data pipelines and platform services; use error budgets to guide reliability improvements.
- Build and operate robust observability solutions (metrics, logs, traces, alerts) for AWS services, Snowflake, and Databricks workloads.
- Partner with data engineering and platform teams to embed reliability-by-design into architecture and delivery practices.
- Perform root cause analysis (RCA) and drive continuous improvement to reduce toil and improve platform availability and performance
- Own and drive resolution of incidents and service requests raised by consumer teams
, providing operational support for platform usage while identifying recurring issues and automating fixes to improve reliability and user experience.
- Practical knowledge of SRE principles
, including SLO/SLI/SLA design and error budgets. - Strong experience with AWS (e.g., EC2, S3, IAM, VPC, Cloud Watch) in production environments
- Experience with observability tools and monitoring/alerting best practices.
- Hands-on experience with automation and IaC (Terraform, Cloud Formation, CDK) and scripting (Python, Bash).
- Exposure to data platforms such as Snowflake and/or Databricks
.
- Experience running DR tests, chaos engineering, or resiliency testing in cloud environments.
- Familiarity with CI/CD pipelines and Git Ops practices.
- Background supporting large-scale data or analytics platforms.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×