Sr. Site Reliability Engineer
Listed on 2026-02-15
-
IT/Tech
Systems Engineer, Cloud Computing, SRE/Site Reliability, Cybersecurity
About Open Loop
Open Loop was co-founded by CEO, Dr. Jon Lensing, and COO, Christian Williams, with the vision to bring care anywhere. Our telehealth support solutions are thoughtfully designed to streamline and simplify go‑to‑market care delivery for companies offering meaningful virtual support to patients across an expansive array of specialties, in all 50 states.
The Role
As a Senior SRE at Open Loop, you’ll ensure the reliability, scalability, and security of our cloud infrastructure supporting our telehealth platform. You’ll partner closely with engineering and security teams to improve automation, observability, and incident response. Based in our Nashville office, you’ll help build resilient systems and foster a culture of shared ownership and continuous improvement.
Key Responsibilities- Cross‑Functional Collaboration
- Partner with engineering teams to improve system reliability and deployment practices
- Engage with teams on SRE guidelines and best practices about automation and infrastructure
- Work with security teams to implement secure, compliant infrastructure
- Operational Excellence
- Ensure 24/7 system availability and rapid incident response
- Implement and maintain disaster recovery and business continuity plans
- Lead efforts to increase automation, observability, and monitoring
- Identify bottlenecks at infra, app, and network layers for performance tuning
- Security
- Understand cloud security principles (least privilege, network segmentation, encryption at rest/in transit)
- Familiarity with compliance frameworks (SOC 2, ISO 27001, GDPR, HIPAA) and how SRE supports them
- Cultural
- Advocate for blameless culture and continuous improvement
- Collaborate closely with product and engineering to make reliability a shared responsibility
- Ability to work a hybrid schedule in our Nashville office 3 days/week
- 4+ years of experience in infrastructure, Dev Ops, or Site Reliability Engineering
- Proven track record implementing large‑scale, distributed systems
- Strong background in AWS, particularly with serverless architecture and container orchestration
- Solid understanding of observability, incident management, and system resilience best practices
- Strong proficiency in at least one programming language (Typescript, Python, Go, etc.)
- Knowledge of Linux/Unix systems and networking
- Experience with Infrastructure as Code (AWS CDK, Cloud Formation)
- Proficiency with monitoring and observability tools (Prometheus, Grafana, ELK, etc.)
- Knowledge of CI/CD pipelines and deployment automation (Git Hub Actions, Jenkins, etc.)
- Understanding of database systems and performance optimization
- Ability to translate technical concepts to non‑technical audiences
- Experience with agile methodologies and project management
- Medical, Dental, and Vision plans
- Flexible Spending/Health Savings Accounts
- Flexible PTO
- 401(k) + Company Match
- Life Insurance, Pet insurance, and more
We have a relatively flat organizational structure here ryone is encouraged to bring ideas to the table and make things happen. This fits in well with our core values of Autonomy, Competence, and Belonging, as we want everyone to feel empowered and supported to do their best work.
Sound like a good fit? We’d love to meet you.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).