Prin Site Reliability Engineer - Hybrid in MN
Listed on 2026-06-01
-
IT/Tech
Cloud Computing, Systems Engineer
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities.
Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
If you are located in MN, you will follow a hybrid schedule with four in-office days per week.
Primary Responsibilities Leadership and Strategy- Develop and execute a comprehensive strategy for SRE, Sec Ops, and Tech Ops aligned with organizational goals, with a focus on improving stability, security, and supportability of all digital properties
- Build, lead, and mentor a high‑performing team of SRE, Sec Ops, and Tech Ops professionals, fostering a culture of collaboration, innovation, and continuous improvement
- Collaborate with cross‑functional leaders and engineering teams to integrate best practices into all aspects of consumer products and platforms, 'baking in' resilience from design to deployment
- Guide teams on priorities, mentor individual contributors, and report to CIOs on critical paths, mitigation plans, and strategic initiatives
- Oversee teams to map and proactively secure and harden end‑to‑end customer journeys across all business units
- Analyze and model dependencies (applications, APIs, infrastructure) and run threat models for various risks, including natural disasters, cyberattacks, and software failures
- Develop AIOPs and MLOPs strategy, oversee implementation and rollout across multiple apps
- Experience building Reusable Agentic AI solution for SRE and Ops function
- Develop and enforce reliability standards, including Service Level Agreements (SLAs), Service Level Indicators (SLIs), and Service Level Objectives (SLOs), utilizing these key metrics to continuously improve system reliability and performance and prioritize work
- Implement automation to reduce manual tasks, enhance operational efficiency, and design resilience features such as automated failovers, geo‑redundancy, circuit breakers, and automated rollbacks
- Ensure minimal downtime and optimal performance of systems through proactive risk threat modeling and mitigation
- Oversee proactive threat detection and response processes to identify and mitigate security threats in real‑time
- Fosters collaboration between security and IT operations teams to enhance incident response capabilities and proactive prevention
- Ensure the stability, scalability, and resilience of the technology infrastructure, including hardware, software, and network operations
- Develop and maintain robust processes for infrastructure provisioning, configuration, and maintenance
- Promote a culture of continuous improvement and innovation, optimizing processes and accelerating technology delivery
You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.
Required Qualifications- Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
- 15+ years of experience in software engineering, site reliability engineering, and/or security or technology operations, with 7+ years serving in a leadership role in one of these areas
- Demonstrated experience driving AI‑led innovation, including building or implementing AI/Ops solutions to improve system reliability, security posture, or cloud operations
- Demonstrated experience driving AI led innovation, including building agentic AI solution for OPS or implementing AI/Ops solutions to improve system reliability, security posture, or cloud operations
- Proven solid knowledge of infrastructure management, system monitoring, incident response, software…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).