Senior Site Reliability Engineer; SRE - Dynatrace & Azure Observability Expert
Job in
Atlanta, Fulton County, Georgia, 30339, USA
Listed on 2026-06-23
Listing for:
RaceTrac, Inc.
Full Time
position Listed on 2026-06-23
Job specializations:
-
IT/Tech
IT Support, Systems Engineer, Cybersecurity, SRE/Site Reliability
Job Description & How to Apply Below
Race Trac Company Overview
Job Description:
We are seeking a highly experienced Site Reliability Engineer (SRE)
with deep expertise in Dynatrace, observability engineering, and Azure cloud technologies
. This role will be exclusively focused on building, enhancing, and managing enterprise observability, telemetry, monitoring, and proactive reliability engineering practices across critical digital platforms.
The ideal candidate must possess advanced hands-on expertise in Dynatrace, especially Dynatrace Query Language (DQL), along with strong knowledge of Azure Monitor, Azure KQL, Application Insights, Azure Functions, APIM, and distributed telemetry concepts. The candidate should have a strong understanding of .NET application architecture and the ability to read and analyze .NET code to support troubleshooting, root cause analysis, and observability implementation within Azure environments.
Experience enabling observability for mobile platforms such as iOS and Android is also required.
This is a highly technical, hands-on role requiring a proactive engineering mindset, strong analytical capabilities, and the ability to collaborate across engineering, cloud, mobile, and business teams.
What You'll Do
Dynatrace & Observability Engineering
- Serve as the primary Dynatrace SME across the organization.
- Design, develop, and optimize enterprise observability solutions using Dynatrace.
- Develop advanced Dynatrace DQL queries, dashboards, workflows, alerts, and analytics.
- Implement intelligent monitoring strategies for applications, APIs, integrations, Azure services, mobile platforms, and distributed systems.
- Continuously improve observability maturity through telemetry standardization, proactive monitoring, and automation.
- Configure and tune alerting mechanisms to improve signal-to-noise ratio and reduce alert fatigue.
- Leverage Dynatrace Davis AI, anomaly detection, and AI-driven root cause analysis capabilities.
- Enable and enhance observability for mobile applications across iOS and Android platforms.
- Build and maintain monitoring solutions using:
- Azure Monitor
- Application Insights
- Azure Log Analytics
- Azure KQL
- Monitor and troubleshoot Azure Function Apps, App Services, APIs, integrations, and backend services.
- Analyze telemetry, traces, logs, metrics, and distributed transactions to identify root causes and performance bottlenecks.
- Troubleshoot cloud-native applications and Azure infrastructure issues.
- Develop proactive monitoring for cloud services, integrations, APIs, and backend processing systems.
- Monitor and troubleshoot Azure API Management (APIM), API Gateways, API endpoints, and integrations.
- Understand end-to-end API transaction flows and dependency mapping.
- Build observability solutions for APIs, middleware platforms, and integration services.
- Diagnose latency issues, transaction failures, authentication issues, and backend service degradation.
- Enable telemetry, monitoring, tracing, and performance analysis for iOS and Android applications.
- Analyze mobile-to-backend transaction flows and end-user experience metrics.
- Troubleshoot mobile application latency, crash analytics, API failures, and connectivity issues.
- Correlate mobile telemetry with backend application and infrastructure monitoring data.
- Utilize prior .NET development experience to troubleshoot application behavior, performance, and deployment issues.
- Read and understand .NET application code to support root cause analysis and observability implementation.
- Work closely with development teams to understand application logic, API flows, dependencies, and exception handling.
- Support Azure Function deployments, configuration management, scaling, and runtime troubleshooting.
- Collaborate with development teams during architecture reviews and production releases.
- Ensure observability and monitoring readiness before deployments go live.
- Perform deep technical analysis across systems by correlating logs, metrics, traces, and application telemetry.
- Conduct root cause analysis (RCA) for recurring incidents and systemic issues.
- Partner with engineering and operations teams to implement preventive improvements and automation.
- Develop KPI-driven reliability improvements focused on system stability, performance, and operational excellence.
- Proactively identify risks, bottlenecks, failure patterns, and reliability concerns before business impact occurs.
- Automate operational workflows and monitoring processes wherever possible.
- Improve operational efficiency using AI-driven insights and automation capabilities.
- Build reusable monitoring frameworks, dashboards, and telemetry standards.
- Drive observability best practices across engineering teams.
Mandatory Technical Skills
- 10+ years of overall IT experience.
- Expert-level hands-on experience with Dynatrace.
- Advanced expertise in Dynatrace…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×