Sr. Site Reliability Engineer
Listed on 2026-05-31
-
IT/Tech
SRE/Site Reliability, Cloud Computing, Systems Engineer, IT Support
Founded in 2017, Obsidian Security was created to close a critical gap: securing the SaaS applications where modern business happens—platforms like Microsoft 365, Salesforce, and hundreds more.
Backed by top investors including Greylock, Norwest Venture Partners, and IVP, we’ve built a complete SaaS security platform to reduce risk, detect and respond to threats, and prevent breaches at the source.
Now, we’re transforming how SaaS is secured—in the era of agentic AI.
Today, Obsidian is trusted by global enterprises such as Snowflake, T‑Mobile, and Pure Storage. We protect more than 200 organizations across North America, Europe, the Middle East, Southeast Asia, Australia, and New Zealand—including many of the world’s largest Fortune 1000 and Global 2000 companies.
Sr. Site Reliability Engineer (SRE)At Obsidian, our Sr. Site Reliability Engineers ensure the reliability, scalability, and operational excellence of a complex multi‑tenant SaaS platform serving enterprise and financial customers. As an SRE, you will work closely with Dev Ops, Platform Engineering, and product teams to improve system observability, incident response, and service resilience across the platform.
This is a hands‑on engineering role focused on building operational excellence through monitoring, automation, debugging, and continuous improvement. You will help ensure that issues are detected and addressed quickly while contributing to systems that improve platform reliability at scale
.
- Reliability Engineering: Improve the reliability, availability, and resiliency of Obsidian’s production systems and distributed services
- Detection & Observability: Build and maintain monitoring, alerting, dashboards, and observability tooling to enhance system visibility and reduce operational noise
- Incident Response & Operations: Support incident response, on‑call operations, troubleshooting, and post‑mortem processes to drive operational excellence
- Collaboration: Partner with engineering teams to implement SLI/SLO practices, operational standards, and reliability‑focused workflows
- Execution: Automate infrastructure operations, deployment workflows, and platform tooling across Kubernetes, cloud infrastructure, and data pipelines
- 3‑6 years of experience in Site Reliability Engineering, Dev Ops, Production Engineering, or related roles
- Experience operating and supporting production systems in AWS and/or GCP
- Familiarity with Kubernetes and Helm in cloud‑native environments
- Experience with observability and monitoring tools such as Prometheus, Grafana, Datadog, or similar platforms
- Exposure to CI/CD systems such as Git Lab CI/CD, Git Hub Actions, ArgoCD, or equivalent
- Strong troubleshooting and debugging skills across distributed systems and microservices
- Experience writing automation or infrastructure tooling using scripting or programming languages
- Strong systems thinking and a collaborative engineering mindset
- Experience supporting SaaS platforms in production environments
- Familiarity with incident management and post‑mortem practices
- Exposure to infrastructure‑as‑code and Git Ops workflows
- Understanding of SLI/SLO concepts and operational metrics
- Experience with enterprise‑scale monitoring or customer‑facing production systems
- Work on reliability challenges across a large‑scale distributed SaaS platform
- Build and improve observability and operational tooling used across engineering
- Gain hands‑on experience with cloud infrastructure, Kubernetes, and production systems
- Help safeguard critical services for enterprise and financial customers
- Production issues are detected and resolved quickly
- Monitoring and alerting provide clear, actionable operational insights
- Reliability metrics and operational practices improve over time
- Engineering teams can effectively troubleshoot and self‑serve observability
- Automation reduces operational toil and improves platform stability
- Competitive compensation with equity and 401k
- Comprehensive healthcare with dental and vision coverage
- Flexible paid time off and paid holiday time off
- 12 weeks of new parent or family leave
- Pe…
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search: