More jobs:
Site Reliability Engineer
Job in
Seattle, King County, Washington, 98127, USA
Listed on 2026-02-16
Listing for:
Astreya
Full Time
position Listed on 2026-02-16
Job specializations:
-
IT/Tech
Cloud Computing, Systems Engineer
Job Description & How to Apply Below
This role is the team facing, consultative side of observability. The senior engineer partners
directly with internal engineering teams to understand their systems, pain points, and reliability
gaps. They translate team needs into observability solutions: dashboards, metrics, SLOs, SLIs,
alerting strategies, and visibility improvements.
How this role works day to day:
- Meet with internal teams to gather technical and operational requirements
- Design and implement tailored observability solutions across tools like Grafana, Sumo, App Dynamics, and New Relic
- Build deeper dashboards for product teams and executive visibility
- Define and maintain SLOs, SLIs, and reliability reporting patterns
- Identify gaps in monitoring or alerting and lead the solutioning
- Partner with embedded SREs across hub and spoke model
- Influence tool consolidation, standards, and enterprise reliability strategy
Top 3
Skills:
- Advanced Grafana Expertise - Strong ability to create complex dashboards, build transformations, define SLOs/SLIs, and integrate with multiple data sources.
- SRE Principles and System Thinking - Deep understanding of service health, SLOs, SLIs, error budgets, incident patterns, distributed systems, and reliability engineering fundamentals.
- Cross Team Collaboration and Technical Requirements Gathering - Ability to sit with teams, understand their needs, translate them into observability solutions, and deliver dashboards, alerting, and reliability patterns.
Core Responsibilities:
- Build dashboards in Grafana for internal teams and leadership.
- Maintain observability tools and handle incoming requests.
- Assist teams with setting up alerting, logging structure, and basic SLOs.
- Instrument new apps into monitoring tools.
- Create repeatable patterns and templates for team onboarding.
- Build playbooks and small automation tasks using Ansible Automation Platform.
Required Skills:
- 3+ years of hands-on observability experience (Grafana required plus supporting tools)
- 2+ years practicing SRE fundamentals (SLOs/SLIs, incident patterns, distributed systems, reliability engineering)
- 5+ total years in SRE, Dev Ops, cloud, systems, platform, or monitoring engineering roles
- Experience partnering with application teams to gather requirements and deliver solutions
- Strong ability to explain complex concepts clearly to non-SRE partners
Nice to Have:
- Experience with Thousand Eyes, App Dynamics, New Relic, or Sumo Logic
- Familiarity with Azure, Kubernetes, CI and CD pipelines, or software delivery platforms
- Experience contributing to observability standards at scale
- Background in high uptime industries such as travel, finance, telecom, or cloud-based SaaS
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×