DevOps Engineering Specialist
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, Cloud Computing, IT Support, SRE/Site Reliability
Our data and API platforms underpin critical analytics and operational services serving critical national infrastructure and consumer facing business. We run a large-scale hybrid data lake spanning on‑prem and cloud components using Kafka for streaming, ELK stack for logging/search analytics, and Prometheus for metrics and alerting. We also build and operate API‑driven application platforms deployed on a Kubernetes ecosystem integrating the Core network with aggregators offering Network‑as‑a‑service capability.
This Dev Ops Engineer role is responsible for ensuring these platforms are secure, observable, scalable and reliable enabling teams to ship changes safely, troubleshoot quickly and operate with confidence. The role reflects the Dev Ops principle of taking services "through to live" and maintaining SLA/operational commitments through automation, monitoring and strong engineering practices.
- Operate and evolve a hybrid data lake (AWS + on‑prem) ensuring performance, resilience and secure connectivity.
- Manage and optimize ELK Stack (Elasticsearch, Logstash, Kibana) for log ingestion, indexing, retention, performance tuning, cluster health and query reliability.
- Build and maintain Prometheus‑based observability: metrics pipelines, alert rules, recording rules, dashboards (e.g., Grafana) using consistent standards (labels, correlation IDs, golden signals, SLO‑aligned dashboards).
- Manage and tune Kafka clusters and ecosystem components (topics, partitions, replication, consumer lag monitoring, ACLs, capacity planning).
- Provide platform integration support for service‑to‑service communication (ingress, API gateway patterns, service mesh where applicable) and ensure API lifecycle hygiene (versioning, deprecation, documentation).
- Contribute to CI/CD practices and automation (pipeline reliability, environment promotion, configuration management, Git Ops where appropriate).
- Work with developers and product teams to ensure clean API lifecycle practices (versioning, documentation, deprecation and backward compatibility).
- Ensure logging/metrics are actionable and support rapid incident triage (clear alerts, meaningful thresholds, low noise, good routing).
- Collaborate with security, network and architecture stakeholders to ensure platform controls meet required standards.
- Strong Linux fundamentals and troubleshooting (system performance, networking, storage).
- Hands‑on Kubernetes experience in production (deployments, upgrades, debugging, cluster/workload operations, managing secrets, network policies).
- Automation mindset: scripting (Python/Bash) + one or more of Terraform/Ansible/Helm/Kustomize/Git Ops.
- Git Ops and modern engineering practices (PRs, code review, release discipline).
- Strong Knowledge of API gateway/service mesh patterns and secure ingress.
- Experience designing observability for serverless systems (logs/metrics/traces) and implementing distributed tracing and dashboards using open standards and various tooling like Elastic, Grafana etc.
- Access, use, and disclose information only as required for the job; ensure appropriate safeguards and adherence to Information Security policies.
- AWS Cloud Practitioner Certification.
- Familiarity with ITIL/incident management and change practices (or equivalent experience).
- Excellent verbal and written communication and interpersonal skills.
- Kubernetes certification (e.g., CKA/CKAD)
- Good understanding of foundational AWS services like EKS, IAM, VPC, S3, Cloud Watch, and hybrid connectivity patterns (e.g., VPN/Direct Connect where applicable).
- Sound understanding of authentication and authorisation patterns, including OpenID Connect (OIDC), OAuth 2.0 and LDAP/Active Directory and how these integrate with Kubernetes (e.g., OIDC‑based SSO, RBAC mapping, identity federation) and AWS identity/access controls.
BT Group was the world's first telco and our heritage in the sector is unrivalled. As home to several of the UK's most recognised and cherished brands – BT, EE, Openreach and Plusnet – we have always played a critical role in creating the future, and we have reached an inflection point in the transformation of our business. Over the next two…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: