×
Register Here to Apply for Jobs or Post Jobs. X

Observability Architect

Job in Atlanta, Fulton County, Georgia, 30383, USA
Listing for: TechDigital Group
Full Time position
Listed on 2026-06-18
Job specializations:
  • IT/Tech
    Cloud Computing: Infrastructure & Operations, SRE/Site Reliability, Systems Engineer
Salary/Wage Range or Industry Benchmark: 125000 - 150000 USD Yearly USD 125000.00 150000.00 YEAR
Job Description & How to Apply Below

Job Description

We are seeking an experienced Observability Architect to design, implement, and mature enterprise-wide observability capabilities across hybrid on-premises and cloud environments. The ideal candidate has deep expertise with log aggregation, metrics, tracing, and application performance monitoring technologies, and can drive automation, standardization, and best‑practice adoption s role will be a key influencer in shaping the organization’s observability strategy, ensuring end‑to‑end system visibility, performance, and reliability.

Key Responsibilities
  • Observability Architecture & Strategy
    • Develop and maintain the enterprise observability reference architecture, covering logs, metrics, traces, events, dashboards, and alerts.
    • Lead the design and implementation of observability solutions that support hybrid multi‑cloud and on‑premise environments.
    • Establish standards, governance, and reusable frameworks for telemetry generation, ingestion, correlation, storage, and visualization.
    • Drive continuous improvement of monitoring maturity, integrating data‑driven insights and AI‑based analytics where applicable.
  • Log Aggregation & Monitoring Solutions
    • Architect and administer large‑scale log aggregation platforms such as Splunk, supporting both on‑prem and cloud deployments.
    • Define and automate ingestion pipelines, parsing logic, index strategies, role‑based access, and performance tuning.
    • Implement configuration management and infrastructure‑as‑code (IaC) practices for repeatable deployment and scaling of observability tools.
  • Application & Network Performance Monitoring
    • Deploy, configure, and optimize APM solutions such as App Dynamics, Dynatrace, or equivalent platforms.
    • Integrate application tracing, synthetic monitoring, real‑user monitoring (RUM), and business transaction analytics.
    • Support and enhance Network Performance Monitoring (NPM) capabilities to ensure end‑to‑end visibility across distributed systems.
  • Cloud‑Native & Modern Monitoring
    • Leverage cloud‑native monitoring tools across AWS, Azure, or GCP (e.g., Cloud Watch, Azure Monitor, GCP Operations Suite).
    • Guide teams in instrumenting microservices, serverless functions, containers, and Kubernetes clusters using Open Telemetry and modern telemetry standards.
    • Partner with infrastructure, application, and SRE teams to ensure high availability, resilience, and performance.
  • Automation & AI‑Driven Engineering
    • Build automated workflows for alert tuning, anomaly detection, dashboards, and telemetry enrichment.
    • Explore and integrate AI/ML‑based observability features such as predictive analytics, signal correlation, and automated root‑cause analysis.
    • Advocate for automation‑first practices and reduction of operational toil.
Required Qualifications
  • 5+ years of hands‑on experience with enterprise‑scale log aggregation platforms, including architecture, deployment, and administration of tools like Splunk across on‑prem and cloud environments.
  • 5+ years of experience using automated configuration management and IaC tools (e.g., Ansible, Terraform, Git Ops frameworks).
  • 2+ years of experience with APM tools such as App Dynamics or Dynatrace, including end‑to‑end application visibility and performance diagnostics.
  • Experience with Network Performance Monitoring tools and methodologies.
  • Strong understanding of cloud infrastructure and cloud‑native monitoring technologies (AWS, Azure, GCP).
  • Familiarity with Open Telemetry, distributed tracing, and service mesh observability.
  • Expertise in designing dashboards, KPIs, and alerting strategies that align to business SLIs/SLOs.
  • Experience collaborating with Dev Ops, SRE, cloud engineering, and application teams in large enterprises.
Preferred Qualifications
  • Experience implementing AI/ML‑driven observability capabilities (e.g., anomaly detection, auto‑baselining, correlation engines).
  • Knowledge of container ecosystems and orchestration platforms (Kubernetes, AKS/EKS/GKE).
  • Experience working with event‑driven architectures and microservices environments.
  • Strong scripting or programming skills (Python, Power Shell, Bash, etc.).
  • Relevant certifications (e.g., Splunk Architect, Dynatrace Professional, Cloud certifications).
Soft Skills
  • Excellent communication and stakeholder management skills.
  • Ability to lead technical strategy and influence architectural decisions.
  • Strong analytical, troubleshooting, and problem‑solving abilities.
  • Adaptability and curiosity about new technologies and evolving observability trends.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary