Prin Site Reliability Engineer - Hybrid in MN Job Minnetonka area,Minnesota USA,IT/Tech

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities.

Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.

If you are located in MN, you will follow a hybrid schedule with four in-office days per week.

Primary Responsibilities Leadership and Strategy

Develop and execute a comprehensive strategy for SRE, Sec Ops, and Tech Ops aligned with organizational goals, with a focus on improving stability, security, and supportability of all digital properties
Build, lead, and mentor a high‑performing team of SRE, Sec Ops, and Tech Ops professionals, fostering a culture of collaboration, innovation, and continuous improvement
Collaborate with cross‑functional leaders and engineering teams to integrate best practices into all aspects of consumer products and platforms, 'baking in' resilience from design to deployment
Guide teams on priorities, mentor individual contributors, and report to CIOs on critical paths, mitigation plans, and strategic initiatives

Site Reliability Engineering (SRE)

Oversee teams to map and proactively secure and harden end‑to‑end customer journeys across all business units
Analyze and model dependencies (applications, APIs, infrastructure) and run threat models for various risks, including natural disasters, cyberattacks, and software failures
Develop AIOPs and MLOPs strategy, oversee implementation and rollout across multiple apps
Experience building Reusable Agentic AI solution for SRE and Ops function
Develop and enforce reliability standards, including Service Level Agreements (SLAs), Service Level Indicators (SLIs), and Service Level Objectives (SLOs), utilizing these key metrics to continuously improve system reliability and performance and prioritize work
Implement automation to reduce manual tasks, enhance operational efficiency, and design resilience features such as automated failovers, geo‑redundancy, circuit breakers, and automated rollbacks
Ensure minimal downtime and optimal performance of systems through proactive risk threat modeling and mitigation

Security Operations (Sec Ops)

Oversee proactive threat detection and response processes to identify and mitigate security threats in real‑time
Fosters collaboration between security and IT operations teams to enhance incident response capabilities and proactive prevention

Technology Operations (Tech Ops)

Ensure the stability, scalability, and resilience of the technology infrastructure, including hardware, software, and network operations
Develop and maintain robust processes for infrastructure provisioning, configuration, and maintenance
Promote a culture of continuous improvement and innovation, optimizing processes and accelerating technology delivery

You’ll be rewarded and recognized for your performance in an environment that will challenge you and give you clear direction on what it takes to succeed in your role as well as provide development for other roles you may be interested in.

Required Qualifications

Bachelor's degree in Computer Science, Information Technology, or a related field, or equivalent practical experience
15+ years of experience in software engineering, site reliability engineering, and/or security or technology operations, with 7+ years serving in a leadership role in one of these areas
Demonstrated experience driving AI‑led innovation, including building or implementing AI/Ops solutions to improve system reliability, security posture, or cloud operations
Demonstrated experience driving AI led innovation, including building agentic AI solution for OPS or implementing AI/Ops solutions to improve system reliability, security posture, or cloud operations
Proven solid knowledge of infrastructure management, system monitoring, incident response, software…