Solution Architect/Team Lead - Observability
Listed on 2026-05-22
-
IT/Tech
Systems Engineer, Cloud Computing
Job Overview
We are seeking an experienced Solution Architect / Team Lead to lead the design and implementation of a next‑generation Observability and Anomaly Detection Platform leveraging AI‑driven capabilities. This platform will support middle‑office operations with advanced monitoring, telemetry engineering, anomaly detection, data validation, visualization, and intelligent remediation capabilities. The ideal candidate will bring deep expertise in enterprise observability, AI/ML integration, and scalable platform architecture using tools such as Splunk, Grafana, Datadog, Open Telemetry
, and modern cloud‑native technologies. This is a strategic leadership role requiring strong architecture skills, technical depth, and the ability to collaborate across engineering, operations, security, and business teams.
- Design and implement scalable, secure, and cost‑effective observability architectures
- Lead the development of enterprise monitoring and anomaly detection platforms
- Build and optimize telemetry pipelines for logs, metrics, traces, and events
- Enable AI/ML‑driven anomaly detection, root cause analysis, and automated remediation
- Integrate observability solutions with business SLAs, SLOs, and reliability objectives
- Define platform governance, monitoring standards, and operational best practices
- Collaborate with cross‑functional teams including infrastructure, Dev Ops, security, and business stakeholders
- Drive platform operationalization, enablement, and adoption across teams
- Evaluate and implement observability tools such as Splunk, Grafana, Datadog, Prometheus, and Open Telemetry
- Ensure platform scalability, security, compliance, and performance optimization
- Lead architecture reviews, technical design discussions, and implementation strategies
- Mentor engineering teams and provide technical leadership on observability initiatives
- 10+ years of experience in Solution Architecture or Enterprise Platform Architecture
- Strong expertise in Observability and Monitoring platforms
- Hands‑on experience with:
- Splunk
- Grafana
- Datadog
- Open Telemetry
- Prometheus / ELK Stack
- Experience building telemetry and monitoring pipelines
- Strong understanding of AI/ML integration for anomaly detection and AIOps
- Experience with root cause analysis and intelligent remediation frameworks
- Expertise in cloud‑native and distributed system architectures
- Strong knowledge of platform security, governance, and operational standards
- Experience defining SLAs, SLOs, and reliability engineering practices
- Strong stakeholder management and cross‑team collaboration skills
- Excellent communication and leadership abilities
- Experience in AIOps or intelligent observability platforms
- Exposure to GenAI integration within enterprise monitoring systems
- Knowledge of Kubernetes, container monitoring, and cloud observability
- Experience in financial services or middle‑office platforms
- Familiarity with Dev Ops, SRE, and automation frameworks
- Solution Architecture
- Observability Platforms
- Splunk
- Grafana
- Datadog
- Open Telemetry
- AIOps
- AI/ML Integration
- Telemetry Engineering
- Root Cause Analysis
- Platform Security
- SLA/SLO Management
- Cloud Monitoring
- Enterprise Architecture
- Dev Ops & SRE
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).