More jobs:
Job Description & How to Apply Below
Key Responsibilities :
Technical Leadership
Provide expert guidance and hands-on technical leadership to Observability engineers, enabling adoption of best-practice instrumentation, telemetry patterns, and performance insights.
Lead service onboarding activities, defining standard approaches for metrics, logs, traces, dashboards, and query patterns.
Act as the technical authority on observability across engineering teams, promoting a culture of engineering excellence.
Strategy & Governance
Own the platform observability strategy and roadmap, ensuring alignment with digital engineering objectives and reliability goals.
Govern telemetry lifecycle, including data collection, retention, access controls, classification, and quality assurance.
Define and maintain organisation-wide observability standards, guidelines, and engineering guardrails.
Design, Implementation & Delivery
Architect and implement scalable, resilient observability pipelines for logs, metrics, traces, and events across distributed systems and multi-environment platforms.
Standardise instrumentation libraries, agents/collectors, alerting frameworks, dashboards, and SLO/SLA models.
Oversee development of reliability indicators (SLIs/SLOs) and ensure consistent adoption across teams.
Automation & CI/CD
Embed observability configuration, alerting, and pipeline validations into CI/CD workflows using configuration-as-code patterns.
Govern observability-related pipeline changes, approvals, and quality gates to ensure robust, secure, and compliant delivery.
Collaboration & Enablement
Partner with product engineering, SRE, platform, and security teams to improve service health, triage complex issues, and drive operational maturity.
Facilitate engineering education through tooling demos, office hours, patterns documentation, and cross-team enablement.
Communicate telemetry insights, reliability posture, and platform risks to senior stakeholders clearly and effectively.
Shift Responsibilities & Operational Support
Work in rotational shifts as required.
Participate in on-call rotations to respond to and resolve high-severity incidents in a timely manner.
The Person
Deep expertise in observability domains: distributed tracing, diagnostic logging, high-volume telemetry pipelines, metrics modelling, and reliability frameworks.
Strong leadership capabilities with experience influencing engineering practices across multiple teams.
Skilled in designing observability platforms for large-scale, distributed, or multi-cloud systems.
Proficient in reliability engineering practices (SLIs/SLOs, error budgets) and data-driven decision making.
Experienced in automation and configuration-as-code for observability components.
Able to work independently, drive adoption, and champion engineering excellence.
Ability to switch between technical discussion with team members and non-technical to stakeholders
Experience architecting observability for large-scale distributed systems, regardless of cloud provider or vendor tool.
Strong experience with one or more observability stacks.
Extensive knowledge of CI/CD pipelining skills using Azure Dev Ops or equivalent
Experience of working in an Enterprise environment.
(t.tech)
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×