Senior Software Engineer; AWS
Listed on 2026-02-05
-
IT/Tech
Data Engineer, Cloud Computing, SRE/Site Reliability, Systems Engineer
Location: Newcastle upon Tyne
We’re looking for a Senior Data Ops / Dev Ops Engineer to design, build, and operate the reliability layer underpinning Sage’s core data platforms, including large-scale batch and streaming data systems.
In this role, you’ll own the observability, monitoring, and operational resilience of cloud native data infrastructure and streaming pipelines, ensuring that data flows, whether event driven or batch, are performant, reliable, and predictable in production.
🏡/🏢
This is a hybrid role requiring 3 days per week in our Newcastle office.
- 30 days:
Get familiar with Sage’s data platform architecture, including batch and streaming pipelines, cloud infrastructure, and existing operational tooling. Understand current monitoring, alerting, logging, and incident response practices, along with data reliability SLAs, failure modes, and engineering standards. - 60 days:
Begin actively improving observability across key data systems, including dashboards, alerts, and pipeline health checks. Contribute to the operation and reliability of batch and streaming workloads, applying Infrastructure as Code, incident learnings, and Data Ops best practices. - 90 days:
Own major aspects of the data platform’s operational reliability and observability strategy. Drive improvements in alert quality, system resilience, pipeline reliability, and operational maturity. Mentor team members on Data Ops and Dev Ops practices, and help shape how data platforms are built and operated going forward.
You’ll work alongside data engineers, AI specialists, product managers, and designers in a highly collaborative environment.
The team focuses on building scalable internal platforms that power data-driven decision making and AI-enabled products across Sage.
How success will be measured- Delivery of reliable, scalable automation and operational capabilities across data ingestion, processing, and platform services.
- Measurable improvements in platform observability, including clear dashboards and actionable alerts tied to data SLAs such as freshness, latency, and availability.
- Reduction in operational toil through Infrastructure as Code, repeatable deployments, and improved self-service onboarding for engineering teams.
- Improved incident response outcomes, including faster detection, faster recovery, and fewer recurring issues through effective post-incident followups.
- Strong operational quality across environments, with platforms operating securely, predictably, and in line with governance and compliance requirements.
- Increased visibility into system health across batch and streaming data pipelines.
- Deep expertise operating a modern Product Data Platform / Data Hub supporting both batch and streaming workloads.
- Hands-on experience with streaming and distributed data processing systems and their operational characteristics.
- Strong exposure to observability engineering for data systems, including metrics, logs, traces, and pipeline health monitoring.
- Experience shaping platform reliability standards, including alerting strategies, run books, and on call readiness.
- Practical cloud infrastructure ownership across storage, compute, and analytics layers used by large scale data platforms.
- You’ll design and operate monitoring and alerting that provides realtime visibility into pipeline health, SLA breaches, and platform behaviour.
- You’ll improve the reliability of batch and streaming data ingestion and processing workloads, focusing on failure recovery and operational robustness.
- You’ll build and maintain cloud infrastructure and deployment automation to keep environments consistent, secure, and repeatable.
- You’ll work closely with data engineering and product teams to improve platform onboarding and reduce the effort required to adopt shared data capabilities.
- You’ll help strengthen governance, compliance, and auditability by improving observability, documentation, and operational controls across the platform.
- Strong experience as a Data Ops, Dev Ops, or Platform Engineer supporting production data systems.
- Proven expertise in observability tooling, including monitoring, logging,…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: