Sr Site Reliability Engineer
Listed on 2026-05-21
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, Cybersecurity
About the Role
You ll own the reliability posture of a large-scale healthcare platform. That means infrastructure design, deployment pipelines, observability, incident response, and the hard conversations about when something isn t production-ready. You ll work alongside software engineers and security engineers who are building real capabilities - your job is to make sure what they build actually runs.
This isn t a ticket-queue SRE role. At this level, we expect you to define what good looks like and pull the team toward it.
What You ll Do- Design and own the infrastructure architecture for a cloud environment: multi-region, high-availability, built for real operational load
- Set reliability standards: SLOs, error budgets, incident response playbooks, runbooks
- Lead the observability practice - define what gets measured, how, and what gets done about it
- Own CI/CD pipeline architecture and deployment strategy across environments
- Be the senior technical voice in design reviews when reliability, scalability, or operational risk is on the table
- Mentor Staff-level engineers - raise the floor on how the team builds and operates systems
- Participate in on-call rotation and lead incident response for platform issues
- Partner with security engineers to ensure infrastructure meets security and compliance requirements without making the platform slow to ship
Required:
- 10+ years of SRE, platform engineering, or infrastructure engineering experience
- Expert-level Kubernetes - you ve designed and operated production clusters, not just deployed to them
- Deep Terraform and infrastructure-as-code experience at scale
- Strong CI/CD pipeline design and implementation experience
- Experience operating production systems in a major cloud platform (AWS, Azure, or GCP)
- US citizenship or Lawful Permanent Resident status (Public Trust eligibility required)
Paths In - You Might Be a Fit If You:
- Have been the most senior SRE on a team and found yourself setting architecture direction, not just executing on it
- Come from a hyperscaler, high-growth startup, or product company and want to apply that scale experience to systems where the stakes are higher than uptime SLAs
- Have been carrying a team s platform reliability on your back informally and want a title and scope that match what you re actually doing
- Are a strong infrastructure engineer who wants to work on something more meaningful than the next product sprint
Helpful but Not
Required:
- Experience with Kafka, Prisma, or event-driven microservices architectures
- Familiarity with security or compliance frameworks (FedRAMP, NIST 800-53, SOC 2, or similar)
- Experience mentoring or technically leading a distributed engineering team
- Prometheus, Grafana, ELK or similar observability stack experience
Our mission is to protect the institutions that underpin free society from cyber threats. We re a small, mission-driven team that works on problems that matter - from offensive security testing for hospitals and banks to building capabilities for national security missions.
We invest in people who invest in themselves. This isn t a body shop. You ll work with a team that takes pride in technical craft and cares about developing the people who join us.
Benefits- Health insurance with vision, dental, and HSA
- Life insurance (100% employer-funded)
- 401(k) with 4% match
- Flexible PTO
To all recruitment agencies:
Satine Technologies does not accept agency resumes.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).