Incident Management Lead
Job in
Deerfield, Lake County, Illinois, 60063, USA
Listed on 2026-06-05
Listing for:
Tata Consultancy Service Limited
Full Time
position Listed on 2026-06-05
Job specializations:
-
IT/Tech
IT Support, IT Project Manager, Cloud Computing, Systems Administrator
Job Description & How to Apply Below
- 6+ years of IT Service Management experience with a minimum of 3 years in a dedicated Major Incident Management or Incident Commander role in a large enterprise (Fortune 500 / FTSE 100 equivalent complexity).
- ITIL 4 Managing Professional or ITIL 4 Specialist:
High Velocity IT certification (ITIL 4 Foundation minimum required). - Demonstrable experience managing Azure platform incidents: working knowledge of Azure Monitor, Azure Service Health, Log Analytics, Application Insights, and Microsoft support escalation paths.
- Proven ability to command high-pressure P1 incidents involving 20+ stakeholders across technical and executive levels simultaneously
- Expert-level proficiency in Service Now ITSM, including Incident, Problem, Change modules and dashboard/report building.
- Strong data analysis skills: ability to analyze incident trends, build KPI dashboards, and present actionable insights to senior leadership.
- Serve as the single accountable owner for all P1 and P2 major incidents across on premises and Azure-hosted services, from initial declaration through resolution and post-incident closure.
- Convene and chair live incident bridge calls and virtual war rooms using Microsoft Teams, coordinating across 10+ internal technical resolver groups, managed service partners, and Microsoft Azure Support (Unified Support escalations).
- Drive swift triage by leveraging Azure Service Health, Resource Health, and Azure Monitor dashboards to rapidly establish scope, affected services, and blast radius within the first 15 minutes of an incident.
- Make and enforce escalation decisions, including engaging Microsoft CSS P1 Severity A support cases and activating DR runbooks where service restoration via normal means is not achievable within RTO.
- Maintain clear, timely, and audience-appropriate stakeholder communications throughout the incident lifecycle, including CEO/CISO executive briefings for business-critical outages.
- Facilitate structured blameless Post-Incident Reviews (PIRs) within agreed SLAs (P1: 48 hours. P2: 5 business days); produce high-quality PIR reports consumed by CTO and Board Technology Committee.
- Own the incident action item registry; chair weekly SIP (Service Improvement Plan) reviews to ensure commitments are delivered on time and to quality.
- Identify systemic incident patterns through trend analysis using Service Now and Log Analytics. collaborate with Problem Management to drive root cause elimination for repeat incidents.
- Define, track, and report on enterprise incident management KPIs: MTTD, MTTR, incident recurrence rate , SLA compliance, and customer impact hours presented to IT leadership in month ly operational reviews.
- Own, maintain, and continuously improve the enterprise Major Incident Management process, policy, playbooks, and runbooks aligned to ITIL 4 and the organizations IT Risk and Control Framework.
- Define and govern the incident severity classification matrix and escalation decision tree. ensure consistent adoption across all IT towers and managed service partners.
- Maintain and test the enterprise crisis communication framework, including stakeholder notification trees, bridge protocols, and executive communication templates.
- Collaborate with Change Management to ensure CAB processes adequately assess change- induced incident risk; maintain correlation tracking between changes and incidents.
- Develop and maintain Azure-specific incident playbooks covering platform scenarios: AKS node/pod failures, Azure SQL failover events, Express Route circuit drops, Azure Active Directory (Entra ) authentication outages, and Azure region-wide service incidents.
- Maintain working relationships with Microsoft TAM (Technical Account Manager) and Azure Rapid Response team: ensure escalation paths to Microsoft CSS are exercised and SLAs understood.
- Monitor Azure Service Health and Microsoft 365 Service Health Dashboard proactively. initiate pre-emptive incident declarations for advisory/degraded-service notifications affecting business-critical services.
- Participate in Azure Operational Reviews with Cloud Platform and SRE teams to identify observability gaps, alerting blind spots, and runbook deficiencies before they manifest as major incidents.
- Design and deliver MIM process training programmes for Level 1/2 Service Desk, resolver groups, and technology leadership; conduct quarterly simulation exercises (Game Day / Incident Ex).
- Act as a subject matter expert in enterprise-wide DR and BCP exercises; validate incident response readiness across all Azure-hosted Tier-0 services.
- Build and manage a network of Incident Coordinators across global IT towers to support follow-the
-sun incident coverage.
- Discretionary Annual Incentive.
- Comprehensive…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×