DevOps Engineer IV
Listed on 2026-06-09
-
IT/Tech
Systems Engineer, SRE/Site Reliability
Location: Atlanta, Ga Contract : 3
Years
Experience: Senior Level
Client
- Southern Company Services.
We are seeking an experienced Dev Ops Engineer IV / Site Reliability Engineer (SRE) with strong hands‑on experience in observability, telemetry, monitoring, and service reliability
. The ideal candidate will have deep knowledge of Grafana, Open Telemetry (OTEL), PromQL, and application/system instrumentation
.
This role will partner with engineering, operations, and application teams to improve service reliability, telemetry quality, alerting maturity, and operational visibility across complex environments.
Key Responsibilities- Design, implement, and support monitoring and observability solutions.
- Build dashboards, alerts, and telemetry solutions using Grafana and related tools.
- Implement Open Telemetry standards for application and system instrumentation.
- Write and optimize PromQL queries for monitoring and reliability insights.
- Improve alerting quality, reduce noise, and create actionable alerts.
- Troubleshoot application and infrastructure issues using logs, metrics, and traces.
- Support incident response, root cause analysis, and reliability improvements.
- Collaborate with engineering, operations, and application teams.
- Strong experience as a Dev Ops Engineer, SRE, Observability Engineer, or similar role.
- Hands‑on experience with Grafana, Open Telemetry, and PromQL.
- Experience with application and system instrumentation.
- Strong understanding of logs, metrics, traces, alerting, and service reliability.
- Ability to design monitoring solutions across complex environments.
- Strong troubleshooting, analytical, communication, and collaboration skills.
- Experience with Prometheus, Loki, Tempo, Kubernetes, containers, cloud platforms, or microservices.
- Familiarity with CI/CD, automation, infrastructure‑as‑code, incident response, SLIs, SLOs, and reliability metrics.
Dev Ops, SRE, Observability, Grafana, Open Telemetry, OTEL, PromQL, Prometheus, Monitoring, Alerting, Logs, Metrics, Traces, Instrumentation, Incident Response, Root Cause Analysis.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).