Observability Platform Architect Job Gurugram area,Uttar Pradesh India,IT/Tech

Job Description

Key Responsibilities :

Technical Leadership

Provide expert guidance and hands-on technical leadership to Observability engineers, enabling adoption of best-practice instrumentation, telemetry patterns, and performance insights.
Lead service onboarding activities, defining standard approaches for metrics, logs, traces, dashboards, and query patterns.
Act as the technical authority on observability across engineering teams, promoting a culture of engineering excellence.

Strategy & Governance

Own the platform observability strategy and roadmap, ensuring alignment with digital engineering objectives and reliability goals.
Govern telemetry lifecycle, including data collection, retention, access controls, classification, and quality assurance.
Define and maintain organisation-wide observability standards, guidelines, and engineering guardrails.

Design, Implementation & Delivery

Architect and implement scalable, resilient observability pipelines for logs, metrics, traces, and events across distributed systems and multi-environment platforms.
Standardise instrumentation libraries, agents/collectors, alerting frameworks, dashboards, and SLO/SLA models.
Oversee development of reliability indicators (SLIs/SLOs) and ensure consistent adoption across teams.

Automation & CI/CD

Embed observability configuration, alerting, and pipeline validations into CI/CD workflows using configuration-as-code patterns.
Govern observability-related pipeline changes, approvals, and quality gates to ensure robust, secure, and compliant delivery.

Collaboration & Enablement

Partner with product engineering, SRE, platform, and security teams to improve service health, triage complex issues, and drive operational maturity.
Facilitate engineering education through tooling demos, office hours, patterns documentation, and cross-team enablement.
Communicate telemetry insights, reliability posture, and platform risks to senior stakeholders clearly and effectively.

Shift Responsibilities & Operational Support

Work in rotational shifts as required.
Participate in on-call rotations to respond to and resolve high-severity incidents in a timely manner.

The Person

Deep expertise in observability domains: distributed tracing, diagnostic logging, high-volume telemetry pipelines, metrics modelling, and reliability frameworks.
Strong leadership capabilities with experience influencing engineering practices across multiple teams.
Skilled in designing observability platforms for large-scale, distributed, or multi-cloud systems.
Proficient in reliability engineering practices (SLIs/SLOs, error budgets) and data-driven decision making.
Experienced in automation and configuration-as-code for observability components.
Able to work independently, drive adoption, and champion engineering excellence.
Ability to switch between technical discussion with team members and non-technical to stakeholders
Experience architecting observability for large-scale distributed systems, regardless of cloud provider or vendor tool.
Strong experience with one or more observability stacks.
Extensive knowledge of CI/CD pipelining skills using Azure Dev Ops or equivalent

Experience of working in an Enterprise environment.

(t.tech)