×
Register Here to Apply for Jobs or Post Jobs. X

Site Reliability Engineer

Job in Houston, Harris County, Texas, 77246, USA
Listing for: Synthesis Health
Full Time position
Listed on 2026-02-16
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below
Position: Staff Site Reliability Engineer

Synthesis Health

We are a mission‑and values‑driven company with tremendous dedication to our customers. Our 100% remote team is dedicated to a common goal – to revolutionize healthcare through innovation, collaboration, and commitment to our core values and behaviors.

About the Opportunity

We are looking for a Staff Site Reliability Engineer (SRE) to serve as the guardian of our platform's availability and the architect of our operational maturity.

In this high‑impact role, you will own the strategy and execution required to achieve and maintain a 99.99% availability SLA for our critical healthcare services. You will not just respond to incidents; you will build the automated systems that prevent them. You will design the auto‑scaling architectures and disaster recovery protocols that allow us to handle bursty medical imaging traffic and catastrophic failures without flinching.

This is a hands‑on leadership role. You will define the standards for reliability engineering across the organization, mentor Senior (L4) engineers, and embed SRE principles into our development culture. You will serve as the technical face of reliability to our enterprise customers, providing the architectural assurances they need to trust us with their most critical workflows.

If you are obsessed with automation, intolerant of manual toil, and ready to lead the reliability strategy for a life‑critical platform, we want to hear from you.

Key Responsibilities
  • Own the 99.99% Target:
    You will define the Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for our critical user journeys. You will be accountable for tracking our Error Budgets and governing the release velocity based on platform stability.
  • Incident Management & Forensics:
    You will own the incident response process, serving as the ultimate escalation point for complex production outages. You will lead blameless post‑mortems (RCAs) to identify root causes and ensure systemic fixes are implemented to prevent recurrence.
  • Eliminate Toil:
    You will ruthlessly identify and automate manual operational tasks. Your goal is to engineer yourself out of operations work so you can focus on high‑value reliability architecture.
Business Continuity & Disaster Recovery (BC/DR)
  • Architect for Catastrophe:
    You will design and implement our Business Continuity and Disaster Recovery strategy. You will orchestrate our regional failover capabilities, ensuring we meet aggressive Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
  • Enterprise‑Grade Resilience:
    You will build the technical credibility required to win grueling enterprise audits. You will demonstrate that our platform is robust, stable, and resistant to unexpected failures through rigorous documentation and proof‑of‑concept demonstrations.
  • "Game Day" Simulations:
    You will lead regular disaster recovery drills and chaos engineering experiments to validate our failover mechanisms, ensuring our team is practically prepared for real‑world scenarios.
Scalability & Performance
  • Intelligent Auto‑Scaling:
    You will design and implement sophisticated auto‑scaling strategies (HPA/VPA/Cluster Autoscaler) on Kubernetes (GKE) to handle unpredictable spikes in medical data ingestion.
  • Capacity Planning:
    You will lead capacity planning and cost optimization initiatives, ensuring our infrastructure scales efficiently with our business growth.
Architectural Leadership
  • Resilience Patterns:
    You will work with the Architecture Review Board (ARB) to enforce resilience patterns (circuit breakers, retries, fallbacks, bulkheads) in our application code and service mesh.
  • Mentorship & Culture:
    You will advocate for SRE culture across the engineering organization, mentoring feature teams on how to build operable, observable, and reliable software.
What We're Looking For
  • Deep SRE

    Experience:

    8+ years of engineering experience, with a significant focus on Site Reliability Engineering or Dev Ops in a high‑scale, 24/7 production environment.
  • BC/DR Orchestration:
    Proven experience designing active‑passive or active‑active multi‑region architectures. You have successfully executed regional failovers and managed the complexities of data…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary