Senior SRE; Site Reliability Engineer
Dallas, Dallas County, Texas, 75215, USA
Listed on 2026-06-18
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, IT Support, SRE/Site Reliability
Role:
Senior SRE Engineer
Location:
Washington DC - Hybrid
We are seeking a high-caliber Senior SRE Engineer to join a premier client in Washington, DC
, to spearhead the evolution of their enterprise observability platform. This is a high-impact role designed for a technical leader with nearly a decade of specialization in Dynatrace SaaS
, tasked with architecting and automating large-scale monitoring solutions across complex AWS and Azure environments. You will bridge the gap between infrastructure and applications, leveraging Davis AI and Grail to drive proactive reliability, mentoring cross-functional Dev Ops teams, and establishing a gold standard for full-stack visibility in a mission-critical, multi-cloud landscape.
- Enterprise Architecture: Lead the design, governance, and rollout of Dynatrace observability for distributed microservices, serverless workloads, and multi-region cloud environments.
- Full-Stack Optimization: Configure deep code-level visibility (Pure Path), Smartscape topology mapping, and advanced APM instrumentation to ensure comprehensive system transparency.
- AI-Driven Insights: Harness Davis AI for causal analysis and root cause identification; develop custom dashboards, alerting profiles, and auto-remediation workflows to minimize MTTR.
- End-User
Experience:
Implement Real User Monitoring (RUM) and Synthetic Monitoring to analyze user journeys and establish performance KPIs. - Automation & Dev Ops: Drive "Observability as Code" by building CI/CD pipelines (Git Hub Actions, Jenkins) and automating infrastructure via Terraform, Cloud Formation, or AWS CDK.
- Log Management: Manage high-volume log ingest pipelines and processing rules using Dynatrace Grail and Log Management features.
- SRE Advocacy: Define and monitor SLIs, SLOs, and error budgets while participating in on-call rotations and developing detailed RCA documentation.
- Extensive Expertise: 9+ years of hands-on experience specifically focused on Dynatrace implementation and management at an enterprise scale.
- Foundational
Experience:
5+ years in SRE, Dev Ops, or Cloud Infrastructure roles, with deep knowledge of Linux systems and networking. - Cloud Proficiency: Advanced experience navigating and securing AWS and Azure environments.
- Automation
Skills:
Strong proficiency in Python or similar scripting languages for building self-service tooling and automation. - Tooling Integration: Proven ability to integrate observability stacks with ITSM and communication tools like Service Now, Pager Duty, and Microsoft Teams.
- Methodology: Experience working within a SAFe Agile delivery environment and a solid understanding of the ITIL framework.
- Education: Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- Location/Flexibility: Ability to work on-site in the Washington, DC area as required and provide off-hours support for critical production incidents.
Flexible work from home options available.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).