Senior SRE: Cloud Reliability & Automation Leader
Listed on 2026-06-06
-
IT/Tech
Cloud Computing: Infrastructure & Operations, Systems Engineer, Network Engineer, IT Support
ABOUT THIS POSITION
Operates company's complex high traffic, business critical internet site communications and/or network-based (cloud) product systems. Plans, designs and implements scalable local and wide-area network solutions between multiple platforms and protocols (including IP and VOIP). Responsible for system performance; supports/troubleshoots network issues and coordinates installation of such items as routers and switches with appropriate vendors. Develops tools to automate the deployment, administration and monitoring of a network system.
Provides training and assists with proposal writing. Conducts project planning, cost analysis and vendor comparisons and works on project implementation. Works with development teams to enhance and improve system operability. Conducts tests of network redundancy, resilience and failover of network elements to ensure up-time standards are fully achieved. May be required to provide on-call service coverage with other department employees.
WHAT YOU'LL DO
Design, implement, and maintain automation for infrastructure provisioning, configuration management, and application deployments across various environments (on-premise and cloud).
Proactively monitor system health, performance, and availability, utilizing a range of observability tools and defining key performance indicators (KPIs) and service level objectives (SLOs).
Lead the investigation and resolution of complex production incidents, perform root cause analysis, and implement preventative measures to minimize future occurrences.
Collaborate with development teams to ensure software is designed for reliability, scalability, and operational efficiency, participating in architectural reviews and providing expert guidance.
Develop and maintain robust incident response procedures, runbooks, and disaster recovery plans.
Contribute to the evolution of our SRE practices, tooling, and best standards, driving continuous improvement and knowledge sharing within the team.
Participate in an on-call rotation to provide 24/7 support for critical production systems.
Mentor junior SREs and contribute to the growth and development of the team.
Evaluate and implement new technologies and solutions to enhance system reliability and operational efficiency.
WHAT YOU'LL NEED
Bachelor's degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
5+ years of experience in a Site Reliability Engineering, Dev Ops, or highly related infrastructure engineering role.
Strong proficiency in at least one scripting/programming language (e.g., Python, Go, Java, Ruby, Bash).
Extensive experience with cloud platforms (AWS, Azure, GCP) including services related to compute, networking, storage, and databases.
Deep understanding of Linux operating systems and networking fundamentals.
Proven experience with infrastructure as code tools (e.g., Terraform, Cloud Formation, Ansible).
Solid experience with CI/CD pipelines and related tools (e.g., Jenkins, Git Lab CI, Git Hub Actions).
Demonstrable expertise in monitoring and alerting systems (e.g., Prometheus, Grafana, Datadog, Splunk).
Strong problem-solving skills with a methodical approach to debugging complex distributed systems.
Excellent communication and collaboration skills, with the ability to work effectively across cross-functional teams.
Experience with containerization technologies (Docker, Kubernetes) is highly desirable.
Familiarity with database technologies (relational and No
SQL) and their operational challenges.
ABOUT WAYSTAR
Through a smart platform and better experience, Waystar helps providers simplify healthcare payments and yield powerful results throughout the complete revenue cycle.
Waystar’s healthcare payments platform combines innovative, cloud-based technology, robust data, and unparalleled client support to streamline workflows and improve financials so providers can focus on what matters most: their patients and communities. Waystar is trusted by 1M+ providers, 1K+ hospitals and health systems, and is connected to over 5K commercial and Medicaid/Medicare payers. We are deeply committed to living out our organizational values:…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).