Senior SRE; Site Reliability Engineer
Prosper, Collin County, Texas, 75078, USA
Listed on 2026-06-06
-
IT/Tech
Cloud Computing, Systems Engineer, SRE/Site Reliability
Job Description
We are seeking a high‑caliber Senior SRE Engineer to join a premier client in Washington, DC
, to spearhead the evolution of their enterprise observability platform. This is a high‑impact role designed for a technical leader with nearly a decade of specialization in Dynatrace SaaS
, tasked with architecting and automating large‑scale monitoring solutions across complex AWS and Azure environments. You will bridge the gap between infrastructure and applications, leveraging Davis AI and Grail to drive proactive reliability, mentoring cross‑functional Dev Ops teams, and establishing a gold standard for full‑stack visibility in a mission‑critical, multi‑cloud landscape. Role:
Senior SRE Engineer.
Location:
Washington DC - Hybrid.
- Enterprise Architecture:
Lead the design, governance, and rollout of Dynatrace observability for distributed microservices, serverless workloads, and multi‑region cloud environments. - Full‑Stack Optimization:
Configure deep code‑level visibility (Pure Path), Smartscape topology mapping, and advanced APM instrumentation to ensure comprehensive system transparency. - AI‑Driven Insights:
Harness Davis AI for causal analysis and root‑cause identification; develop custom dashboards, alerting profiles, and auto‑remediation workflows to minimize MTTR. - End‑User
Experience:
Implement Real User Monitoring (RUM) and Synthetic Monitoring to analyze user journeys and establish performance KPIs. - Automation & Dev Ops:
Drive 'Observability as Code' by building CI/CD pipelines (Git Hub Actions, Jenkins) and automating infrastructure via Terraform, Cloud Formation, or AWS CDK. - Log Management:
Manage high‑volume log ingest pipelines and processing rules using Dynatrace Grail and Log Management features. - SRE Advocacy:
Define and monitor SLIs, SLOs, and error budgets while participating in on‑call rotations and developing detailed RCA documentation.
- Extensive Expertise: 9+ years of hands‑on experience specifically focused on Dynatrace implementation and management at an enterprise scale.
- Foundational
Experience:
5+ years in SRE, Dev Ops, or Cloud Infrastructure roles, with deep knowledge of Linux systems and networking. - Cloud Proficiency:
Advanced experience navigating and securing AWS and Azure environments. - Automation
Skills:
Strong proficiency in Python or similar scripting languages for building self‑service tooling and automation. - Tooling Integration:
Proven ability to integrate observability stacks with ITSM and communication tools like Service Now, Pager Duty, and Microsoft Teams. - Methodology:
Experience working within a SAFe Agile delivery environment and a solid understanding of the ITIL framework. - Education:
Bachelor's degree in Computer Science, Engineering, or a related technical field. - Location/Flexibility:
Ability to work on‑site in the Washington, DC area as required and provide off‑hours support for critical production incidents.
Flexible work from home options available.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).