Senior Site Reliability Engineer (SRE)
Location :
Wokingham (2 days / week onsite)
Type :
Inside IR35
Rate :
Up to £70.00 per hour (DOE)
We’re looking for a Senior Site Reliability Engineer (SRE) to lead efforts in maintaining the reliability, performance, and scalability of mission-critical platforms and services. This role is ideal for someone who thrives at the intersection of software engineering, infrastructure, automation, and incident response.
You’ll be instrumental in defining and implementing the standards and systems that keep applications running smoothly across cloud and hybrid environments—including Open Shift clusters.
What You’ll Be Responsible ForAs a Senior SRE, you will :
- Ensure high availability, performance, and latency of critical systems across Azure, AWS, and Open Shift.
- Design and implement robust observability systems (logging, monitoring, alerting) to detect and resolve issues proactively.
- Lead and evolve incident management processes—runbooks, comms, postmortems, and root cause analysis.
- Define and monitor SLIs, SLOs, and error budgets to balance innovation with stability.
- Automate manual processes through infrastructure-as-code, scripting, and modern CI / CD pipelines.
- Mentor engineering teams on best practices for deployment, reliability, scalability, and incident preparedness.
- Support and scale Open Shift-based containerized applications, including upgrade strategies, patching, and workload optimization.
- Act as the senior escalation point for outages and critical incidents.
- Lead post-incident reviews and implement long-term remediation plans.
- Communicate platform health and risk posture to stakeholders at all levels.
- Build and improve CI / CD pipelines using tools like Azure Dev Ops, Git Hub Actions, Jenkins, and Git Lab.
- Design scalable, fault-tolerant infrastructure with IaC tools (Terraform, Bicep).
- Create internal tools and automation to accelerate development and reduce operational toil.
- Architect cloud and container infrastructure, with a focus on Open Shift, Kubernetes, and hybrid deployments.
- Collaborate with engineering, architecture, and security teams to embed reliability into the SDLC.
- Promote advanced deployment strategies (blue-green, canary, rolling updates) and rollback readiness.
- Drive a culture of reliability, observability, and operational excellence across engineering teams.
Hands-on experience with many of the following is expected :
- Cloud & Containers :
Azure, AWS, Open Shift , Kubernetes, Docker, App Services, IaaS (EC2, VMs) - CI / CD & Automation :
Terraform, Bicep, Azure Dev Ops, Jenkins, Git Hub Actions, Git Lab - Observability :
Prometheus, Grafana, Datadog, ELK, Splunk, Application Insights, Cloud Watch - Languages & Scripting :
Python, C#, Bash, Power Shell - Networking : DNS, SSL / TLS, load balancing, WAF, proxies, CDN, Azure App Gateway
- Databases : MSSQL, Postgre
SQL, Mongo
DB, Cosmos
DB, DynamoDB - OS & Systems :
Windows, Linux, Nginx, IIS
- 5+ years of experience in SRE, Dev Ops, or production engineering roles.
- Expertise operating in high-availability, fast-paced production environments.
- Solid engineering foundation with experience reading and writing production code.
- Hands-on experience deploying, supporting, and scaling Open Shift environments.
- Proven track record of leading incident responses and improving system reliability.
- Strong collaboration and mentoring abilities across infrastructure, development, and security teams.
- Ability to balance operational risk with engineering velocity.
- Strong communication skills across technical and non-technical audiences.
- A passion for automating everything and eliminating manual work.
- A mindset of ownership, continuous improvement, and technical leadership.
If you’re a senior SRE with Open Shift experience and a drive to solve complex operational challenges, we’d love to hear from you.
#J-18808-LjbffrTo Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: