Sr. Site Reliability Engineer
Job in
Redmond, King County, Washington, 98052, USA
Listed on 2026-06-06
Listing for:
Practice By Numbers, Inc.
Full Time
position Listed on 2026-06-06
Job specializations:
-
IT/Tech
Systems Engineer, SRE/Site Reliability, Cloud Computing, IT Support
Job Description & How to Apply Below
This is an engineering-first Senior SRE role.
We’re looking for senior engineers who have:
- Built and shipped significant backend systems and/or distributed platforms
- Owned services end-to-end in production (design → launch → on-call → reliability improvements)
- Led incident response and driven durable follow-ups
- Improved reliability by writing software and changing system design—not by adding manual process
Engineers here own services end-to-end—from design to production reliability.
Important:
This is not a system administrator role. We are explicitly hiring an engineering leader in reliability. Engineering degree is an absolute requirement (BS/MS in CS/CE/EE or closely related engineering field).
- Own reliability outcomes for critical services: availability, latency, incident rate, and recovery time.
- Design and build reliable, scalable distributed systems that support mission-critical healthcare workflows.
- Define and operationalize SLOs/SLIs and error budgets; drive adoption across teams and use them to prioritize work.
- Lead incident response for high-severity issues; improve on-call effectiveness and reduce alert fatigue.
- Run blameless postmortems and ensure follow-ups are implemented, measured, and stick.
- Write software to eliminate operational toil: automation, self-service tooling, guardrails, and developer platforms.
- Raise the bar on observability (metrics/logs/traces), alerting strategy, and operational readiness.
- Improve resilience through capacity planning, load testing, performance tuning, and failure testing.
- Mentor engineers (SRE and product engineers) on reliability practices, debugging, and production ownership.
- Drive cross-team improvements like production readiness reviews, release safety (progressive delivery), and standard runbooks.
Required
- Engineering degree is mandatory: BS/MS in Computer Science, Computer Engineering, Electrical Engineering, or a closely related engineering field.
- 6+ years experience in software engineering, SRE, infrastructure/platform engineering, or related.
- Strong programming skills in Go, Python, Java, or similar (production-quality code).
- Proven experience building and operating production backend services or distributed systems.
- Meaningful experience in on-call rotations, incident leadership, and post-incident improvement execution.
- Strong debugging ability across complex systems: latency, saturation, cascading failures, dependency issues.
- Experience with cloud infrastructure (AWS preferred, GCP/Azure acceptable).
- You’ve owned reliability for customer-facing services with clear, measurable improvements (e.g., higher availability, lower MTTR).
- You’ve built internal platforms/tooling that made other engineers faster and reduced operational burden.
- You’ve worked in an SRE culture with SLOs, error budgets, and blameless postmortems.
- You’ve led multi-quarter reliability initiatives spanning multiple teams/services.
- Cloud: AWS
- Containers:
Docker, Kubernetes - Infrastructure as Code:
Terraform - Observability:
Prometheus, Grafana, Open Telemetry - Languages:
Go, Python, Type Script - CI/CD:
Git Hub Actions
- System administration / IT ops / helpdesk
- Manual server patching as a primary responsibility
- A “click-ops” cloud operator role
- Build and operate mission-critical healthcare infrastructure that supports real patient workflows.
- High impact: reliability work directly improves customer trust and revenue-critical operations.
- Small team with high ownership, autonomy, and ability to influence architecture.
- Strong engineering culture focused on automation, simplicity, and measurable outcomes.
Position Requirements
5+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×