Senior SRE; Site Reliability Engineer Job Prosper Texas USA,IT/Tech

Position: Senior SRE (Site Reliability Engineer)

Job Description

We are seeking a high‑caliber Senior SRE Engineer to join a premier client in Washington, DC
, to spearhead the evolution of their enterprise observability platform. This is a high‑impact role designed for a technical leader with nearly a decade of specialization in Dynatrace SaaS
, tasked with architecting and automating large‑scale monitoring solutions across complex AWS and Azure environments. You will bridge the gap between infrastructure and applications, leveraging Davis AI and Grail to drive proactive reliability, mentoring cross‑functional Dev Ops teams, and establishing a gold standard for full‑stack visibility in a mission‑critical, multi‑cloud landscape. Role:
Senior SRE Engineer.

Location:

Washington DC - Hybrid.

Core Responsibilities

Enterprise Architecture:
Lead the design, governance, and rollout of Dynatrace observability for distributed microservices, serverless workloads, and multi‑region cloud environments.
Full‑Stack Optimization:
Configure deep code‑level visibility (Pure Path), Smartscape topology mapping, and advanced APM instrumentation to ensure comprehensive system transparency.
AI‑Driven Insights:
Harness Davis AI for causal analysis and root‑cause identification; develop custom dashboards, alerting profiles, and auto‑remediation workflows to minimize MTTR.
End‑User

Experience:

Implement Real User Monitoring (RUM) and Synthetic Monitoring to analyze user journeys and establish performance KPIs.
Automation & Dev Ops:
Drive 'Observability as Code' by building CI/CD pipelines (Git Hub Actions, Jenkins) and automating infrastructure via Terraform, Cloud Formation, or AWS CDK.
Log Management:
Manage high‑volume log ingest pipelines and processing rules using Dynatrace Grail and Log Management features.
SRE Advocacy:
Define and monitor SLIs, SLOs, and error budgets while participating in on‑call rotations and developing detailed RCA documentation.

Qualifications

Extensive Expertise: 9+ years of hands‑on experience specifically focused on Dynatrace implementation and management at an enterprise scale.
Foundational

Experience:

5+ years in SRE, Dev Ops, or Cloud Infrastructure roles, with deep knowledge of Linux systems and networking.
Cloud Proficiency:
Advanced experience navigating and securing AWS and Azure environments.
Automation

Skills:

Strong proficiency in Python or similar scripting languages for building self‑service tooling and automation.
Tooling Integration:
Proven ability to integrate observability stacks with ITSM and communication tools like Service Now, Pager Duty, and Microsoft Teams.
Methodology:
Experience working within a SAFe Agile delivery environment and a solid understanding of the ITIL framework.
Education:

Bachelor's degree in Computer Science, Engineering, or a related technical field.
Location/Flexibility:
Ability to work on‑site in the Washington, DC area as required and provide off‑hours support for critical production incidents.

Flexible work from home options available.

#J-18808-Ljbffr