Senior Manager AI Reliability Operations
Morrisville, Wake County, North Carolina, 27560, USA
Listed on 2026-03-06
-
IT/Tech
Cloud Computing, Systems Engineer
General Information
Req #: WD
Career area:
Software Engineering
Country/Region:
United States of America
State:
North Carolina
City:
Morrisville
Date:
Monday, March 2, 2026
Working time:
Full-time
Additional Locations:
United States of America - Illinois - Chicago
We are Lenovo. We do what we say. We own what we do. We WOW our customers.
Lenovo is a US $69 billion revenue global technology powerhouse, ranked #196 in the Fortune Global 500, and serving millions of customers every day in 180 markets. Focused on a bold vision to deliver Smarter Technology for All, Lenovo has built on its success as the world's largest PC company with a full-stack portfolio of AI-enabled, AI-ready, and AI-optimized devices (PCs, workstations, smartphones, tablets), infrastructure (server, storage, edge, high performance computing and software defined infrastructure), software, solutions, and services.
Lenovo's continued investment in world-changing innovation is building a more equitable, trustworthy, and smarter future for everyone, everywhere. Lenovo is listed on the Hong Kong stock exchange under Lenovo Group Limited (HKSE: 992) (ADR: LNVGY).
This transformation together with Lenovo's world-changing innovation is building a more inclusive, trustworthy, and smarter future for everyone, everywhere. To find out more visit , and read about the latest news via our Story Hub.
Description and RequirementsAbout Our Team
Lenovo is building Quantum, a next generation hybrid AI platform that spans Windows, Android, and cloud. As part of this initiative, we are expanding the Qira organization - Lenovo's cross device Personal AI that works seamlessly across Lenovo and Motorola products.
We are seeking a Senior Manager, AI Reliability Operations to lead the operational backbone that keeps Qira safe, stable, performant, and continuously improving. This leader will own our Operations pillar within the Qira SRE organization, responsible for oncall excellence, incident response, AI change safety, deployment reliability, and production governance across device, edge, and cloud environments.
This is a high impact leadership role shaping how Qira operates at global scale.
Location:
Open to remote work in the US. The preferred work location is Chicago, IL.
- Lead and scale the Operations pillar within Qira SRE, including oncall/NOC, incident management, deployments, and operational readiness.
- Drive operational excellence for Qira's hybrid AI systems across ondevice, edge, and cloud environments.
- Establish a worldclass followthesun oncall model, ensuring rapid detection, response, and recovery from incidents.
- Own incident response, including command, coordination, communications, and postincident analysis.
- Create a culture of blameless postmortems and continuous learning.
- Build automation, runbooks, and tooling that dramatically reduce MTTR and operational toil.
- Own the AI change management lifecycle for model, prompt, retriever, index, and policy updates.
- Implement safe rollout mechanisms including shadow testing, canarying, evaluation gates, and automated rollback policies.
- Ensure every production change meets reliability, safety, and auditability standards.
- Own operational frameworks including runbook requirements, change controls & ITSM, incident taxonomies, operational readiness reviews, reliability signoff for launches, operational governance frameworks.
- Partner with Security, Compliance, and Product Safety on runtime policy enforcement and operational safeguards.
- Partner with AI/ML, Platform, Firmware, Dev Ops, and Product teams to ensure reliability and operational criteria are built into every release.
- Collaborate closely with Observability, Service Reliability Engineering, and AI Reliability pillars in a unified reliability mission.
- Advocate for and help prioritize operational improvements across the engineering ecosystem.
- Hire, mentor, and grow a high performing global team of SREs, Dev Ops engineers, and incident specialists.
- Foster a culture of accountability, collaboration, and operational craftsmanship.
- Define career paths and leadership opportunities for reliability operations staff.
- 10+ years in Site Reliability Engineering, Production Engineering, Dev Ops, or large scale operations, including 3+ years leading teams.
- Bachelor's Degree in Computer Science, Engineering, or related technical field.
- Experience running mission critical oncall operations for distributed systems.
- Deep knowledge of incident management, crisis response, and postmortem practices.
- Handson experience with CI/CD pipelines, deployments, and change management.
- Experience operating systems in cloud environments (AWS, Azure, GCP).
- Strong understanding of Linux systems, networking, and distributed system fundamentals.
- Excellent leadership, communication, and cross functional alignment skills.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).