More jobs:
Lead - Capacity & Automation; SRE
Job in
Manchester, Greater Manchester, M9, England, UK
Listed on 2026-01-09
Listing for:
BT Group
Full Time
position Listed on 2026-01-09
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing
Job Description & How to Apply Below
# Lead - Capacity & Automation (SRE)
Job Req Date: 5 Jan 2026
Function:
Software Engineering Unit:
Networks
Location:
New Bailey, Manchester, United Kingdom Salary:
Competitive with Great Benefits##
** Why this job matters
** This role is critical to the success of our Private Cloud platform. As the single accountable owner for capacity management, you will ensure that our VMware-based infrastructure is reliable, scalable, and cost-efficient—enabling the business to deliver programmes without risk of capacity-related delays. By applying Agile product ownership and Site Reliability Engineering (SRE) principles, you will transform capacity management into a proactive, data-driven capability that continuously evolves to meet business demand.
You will define and implement forecasting models, automation, and guardrails that prevent saturation and optimise resource utilisation. Your work will directly impact platform reliability, programme delivery, and financial efficiency, making this role a cornerstone of our technology strategy. Through telemetry, automation, and governance, you will provide the insights and controls that keep private cloud (EC.3) resilient, cost-effective, and ready for future growth.
** This role is hybrid (3 days in the office) in either Birmingham / London / Manchester**##
** What you’ll be doing
*** Own the Private Cloud “EC.3” Capacity Management Platform – act as the single accountable owner for capacity planning, forecasting, modelling, and optimisation across the VMware-based Enterprise Cloud v3 environment.
* Define and Deliver the Capacity Roadmap – translate business demand and programme milestones into a prioritised backlog of features and automation, using Agile delivery practices.
* Implement SRE Guardrails – establish SLIs, SLOs, and error budgets for infrastructure
-related reliability; ensure proactive risk management
* Develop Forecasting Models – build accurate short-, medium-, and long-term capacity forecasts using telemetry and scenario analysis to prevent saturation and ensure headroom.
* Automate Capacity Workflows – reduce manual toil by creating scripts, policies, and integrations for rightsizing, placement, and quota enforcement using Power
CLI, APIs, and IaC.
* Maintain Real-Time Telemetry & Dashboards – provide a single source of truth for utilisation, trends, and optimisation opportunities through VMware Aria Operations (vROps) and reporting tools.
* Optimise Cost and Efficiency – align with Fin Ops principles to deliver show back/chargeback reporting, identify waste, and implement cost-saving measures without compromising reliability.
* Integrate with ITSM & Governance – ensure Service Now CMDB accuracy, automate request fulfilment, and maintain compliance with capacity policies and audit requirements.
* Collaborate Across Teams – work closely with Architecture, Programme Delivery, Finance, and Operations to align capacity decisions with strategic objectives and risk appetite.
* Continuously Improve – evolve the capacity management capability through iterative enhancements, stakeholder feedback, and adoption of emerging best practices.## Leadership Accountabilities
* Vision & Strategy – Define and communicate the long-term vision for capacity management on EC.3, ensuring alignment with business objectives and technology strategy.
* Ownership & Accountability – Act as the single point of accountability for capacity planning, forecasting, and optimisation across the VMware platform.
* Influence & Stakeholder Engagement – Build strong relationships with senior stakeholders, program leads, and cross-functional teams to drive decisions and secure buy-in.
* Agile Leadership – Champion Agile ways of working, ensuring backlog prioritisation, iterative delivery, and continuous improvement of the capacity capability.
* Reliability Governance – Embed SRE principles into leadership decisions, balancing innovation with risk management through SLIs, SLOs, and error budgets.
* Financial Stewardship – Lead cost optimisation initiatives aligned with Fin Ops principles, ensuring efficient use of resources and transparent reporting.
* Team Enablement – Mentor and guide engineers and…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×