Site Reliability Engineer IV Job Mountlake Terrace area,Washington USA,IT/Tech

## Site Reliability Engineer IVApplylocations:
Mountlake Terrace WAtime type:
Full time posted on:
Posted 2 Days Agojob requisition :
R28853
** Workforce Classification:
** Hybrid
* * Join Our Team:
Do Meaningful Work and Improve People’s Lives
** Our purpose, to improve customers’ lives by making healthcare work better, is far from ordinary. And so are our employees. Working at Premera means you have the opportunity to drive real change by transforming healthcare.

Premera is committed to being a workplace where people feel empowered to grow, innovate, and lead with purpose. By investing in our employees and fostering a culture of collaboration and continuous development, we’re able to better serve our customers. It’s this commitment that has earned us recognition as one of the best companies to work for. Learn more about our recent awards and recognitions as a greatest workplace.

Learn how Premera supports our members, customers and the communities that we serve through our Healthsource blog:
** Site Reliability Engineer IV
**** Job Description Summary
** As a Site Reliability Engineer IV, you will drive reliability and operational excellence across cloud, on-premise, and hybrid platforms. You will build scalable automation and AI-powered tooling to improve system health, reduce manual effort, and accelerate incident response.

Partnering with software and platform engineering teams, you will standardize CI/CD, observability, and incident management practices, enabling resilient, self-healing systems. This role is critical to scaling engineering reliability and advancing intelligent automation across enterprise platforms.
** This is a hybrid role, located on our campus in Mountlake Terrace, Washington
**** What You’ll Do
*** Build, run, and optimize critical services across cloud, on-premise, and hybrid environments, including managed services, custom applications, and third-party integrations
* Develop automation and AI-powered tooling to reduce manual intervention, including anomaly detection, predictive alerting, and LLM-assisted diagnostics that surface actionable insights
* Design and implement end-to-end observability, telemetry, and self-healing capabilities across platforms
* Lead cross-team efforts to drive root cause analysis, post-incident reviews, and long-term reliability improvements
* Define and drive reliability strategy, standards, and best practices across engineering teams
* Standardize workflows for change management, deployment, and incident response, replacing manual processes with tooling-driven solutions
* Partner with engineering and security teams to ensure deployment pipelines and automation practices meet reliability, safety, and compliance standards
* Influence adoption of modern Dev Ops practices including CI/CD, infrastructure-as-code, and test-driven development
* Stay current on emerging technologies in AI/ML, Dev Ops, and platform engineering, and apply them to improve operational efficiency
* Participate in the on-call rotation and support production systems as needed
* This role does not involve day-to-day coding, it requires strong technical depth to guide teams, conduct rapid proofs of concept, and provide guidance on performance, reliability, cost and operational excellence.
** Minimum Qualifications
*** Bachelor’s degree in Computer Science, Information Systems, or related field — or equivalent experience
* 7+ years of experience in Site Reliability Engineering, Dev Ops, or IT Operations within complex environments
* Demonstrated experience leveraging AI platforms and tooling to design and build automation solutions.
** Preferred Qualifications
*** Hands on experience applying AI/ML to operational workflows, including anomaly detection, predictive alerting, or intelligent automation at scale
* Advanced experience with Kubernetes, Docker, and container-based platforms
* Deep expertise with event streaming platforms
* Experience working across cloud, on-premise, and hybrid environments
* Experience working in large-scale, regulated enterprise environments.
** Knowledge, Skills, and Abilities
*** Advanced troubleshooting across distributed systems and applications
* Proficiency in one or…