More jobs:
Ops Lead Engineer – Big Data Platform
Job in
Greater London, London, Greater London, W1B, England, UK
Listed on 2026-02-17
Listing for:
Lebara Media Services Private Ltd
Full Time
position Listed on 2026-02-17
Job specializations:
-
IT/Tech
Cloud Computing, Data Engineer, SRE/Site Reliability
Job Description & How to Apply Below
Role
Summary:
A strategic and hands-on Operations Lead to ensure the resilience, performance, and cost-effectiveness of our Azure-based data platform. This role is at the heart of our data ecosystem,
combining platform reliability, incident response, SLA management, cost optimization (Fin Ops), and deployment oversight.
You will be the single point of contact for operational issues, driving rapid resolution during outages, leading communications with stakeholders, and shaping the processes that keep our
platform running smoothly and efficiently.
Responsibilities:- Own the day-to-day stability and performance of our Azure data platform (Synapse, Databricks, ADF, Power BI).
- Act as the primary point of contact for incidents and outages — driving resolution, root cause analysis, and clear stakeholder communication.
- Define, implement, and enforce SLAs for critical pipelines, datasets, and reporting assets.
- Run Fin Ops forums with business stakeholders to improve cost transparency, accountability, and efficiency across the platform.
- Oversee CI/CD pipelines and deployments, ensuring reliable, safe, and compliant delivery of data platform changes.
- Champion monitoring, observability, and automation to detect and resolve issues proactively while reducing manual intervention.
- Develop and maintain operational runbooks, escalation protocols, and incident playbooks to strengthen resilience.
- Partner with data engineering and analytics teams to align operational strategy with business goals and future platform roadmap.
- Operational Leadership:
Proven track record in leading operations for large-scale data platforms, ensuring stability, performance, and stakeholder trust. - Incident & SLA Management:
Skilled in incident triage, root cause analysis, escalation handling, and defining/enforcing SLAs with cross-functional teams. - Azure Data Stack:
Hands-on experience with Azure Synapse, Databricks, ADF, and Power BI, with the ability to guide best practices and optimisations. - Automation & CI/CD:
Familiar with CI/CD processes and automation to streamline deployments and reduce manual intervention. - Fin Ops Mindset:
Experience in cost management, usage reporting, and running forums with business stakeholders to drive accountability and efficiency. - Monitoring & Observability:
Knowledge of modern monitoring, alerting, and data quality frameworks to ensure proactive platform health management.
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×