GIS ITSM Operations Manager
Listed on 2026-02-07
-
IT/Tech
IT Support, Cloud Computing
The group you'll be a part of
The Global Information Systems Group is dedicated to the success of Lam through providing best-in-class and innovative information system solutions and services. Together, we support users globally with data, information, and systems to achieve their business objectives.
The impact you'll makeThe ITSM Operations Manager leads GISwide service operations across Incident, Problem, Change, Request, Configuration, and Release Management. This role owns the CAB, CMDB health and automation, observability signal integration, the enterprise developer portal for GIS platform services, and the SLO/SLA framework with error budgets. Success is measured by accelerated MTTR, improved change success rates, high CMDB accuracy, reduced alert noise, clear SLOs with actionable error budgets, and seamless release coordination-with strong guardrails, automation, and AIOps that scale reliably across all infrastructure and applications.
Whatyou'll do Change & Release Management
- Manage the Change Advisory Board (CAB):
Chair CAB, enforce policy, risk assessment, segregation of duties, and approval workflows. - Establish GISwide release calendar:
Maintain a unified calendar across platforms; coordinate release windows, blackout periods, and dependencies. - Automate release notes and org communications:
Drive autogenerated release notes, stakeholder notifications, and postrelease reporting. - Compliance & audit readiness:
Track change success/failure rates, rollback trends, and adherence to policy and regulatory requirements.
- Own CMDB strategy and data model:
Define CI classes, relationships, and normalization/enrichment rules aligned to GIS architecture. - Automated inventory management:
Implement discovery, reconciliation, and lifecycle updates for all infrastructure. - Application Portfolio Management (APM) integration:
Incorporate APM into CMDB to deliver intelligent impact assessments for change requests and incident impact analysis. - Data quality & completeness:
Measure and improve accuracy, coverage, and timeliness of CI records via automated controls.
- Design and automate workflows:
Standardize and automate Incident, Problem, and Request processes for consistency and speed. - MTTR reduction using AIOps:
Deploy correlation, noise reduction, anomaly detection, and runbooks to accelerate triage and resolution. - Root cause & trend analysis:
Lead problem management to eliminate recurring issues and drive preventative fixes. - Self healing automation:
Define and maintain autoremediation playbooks, guardrails, and approvals for safe execution.
- Integrate observability platforms with ITSM:
Correlate alerts, enrich tickets with telemetry and topology, and route to the right resolver groups. - Define SLOs/SLIs & error budgets:
Partner with SRE/platform teams to set service level objectives. - Operationalize error budgets:
Implement policies for budget consumption, burnrate alerts, and automated actions when thresholds are crossed. - Alert hygiene & event management:
Suppression, deduplication, dynamic thresholds; health models, business service mapping, and SLObased alerting to improve prioritization and impact assessment. - Reliability/velocity balance:
Use error budgets to make data driven decisions that balance feature delivery speed with service reliability.
- Operational metrics & dashboards:
Create actionable analytics across Incident, Change, Request, Problem, Release, and CMDB health. - SLO & error budget reporting:
Publish service SLO compliance, error budget burn rate, budget exhaustion events, and their correlation with releases/changes and incidents. - Executive reporting:
Provide weekly/monthly scorecards, trend analyses, and recommendations to leadership and CAB. - Data driven improvements:
Use quantitative insights to prioritize automation, address bottlenecks, and improve service levels.
- Define portal strategy & taxonomy:
Publish self service catalogs, API documentation, SLAs/SLOs, error budget policies, standards, and onboarding guides. - Automate self service:
Enable automated provisioning, change submissions, runbook execution, and status/SLO visibility for developers. - Governance & lifecycle:
Keep content current; measure adoption, satisfaction, and request deflection.
- ITSM Expertise:
Deep experience with Incident, Problem, Change, Request, CMDB, CAB, and Release Management in enterprise environments. - SLOs & Error Budgets:
Practical experience defining SLIs/SLOs, setting error budgets, and integrating them into operational decision making (change gating, incident priority, postmortems). - Automation & AIOps:
Runbooks, orchestration, correlation, noise reduction, and autoremediation. - Observability:
Handson with metrics, logs, traces; integrating tools (e.g., Datadog, Dynatrace, New Relic, Splunk, Azure Monitor, Prometheus/Grafana) into ITSM. - CMDB & Discovery: CI modeling,…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).