×
Register Here to Apply for Jobs or Post Jobs. X

Principal Manager, Incident Management - AMER

Job in Atlanta, Fulton County, Georgia, 30301, USA
Listing for: Microsoft Corporation
Full Time position
Listed on 2026-06-23
Job specializations:
  • Engineering
    Systems Engineer
Job Description & How to Apply Below
Overview

Microsoft Cloud Infrastructure and Operations (CO+I) is the engine that powers Microsoft's cloud services. The group is responsible for designing, building, and operating Microsoft's global datacenters; managing the programmatic delivery of our critical infrastructure design, equipment procurement, construction delivery, infrastructure innovation, demand planning and capacity utilization of our unified infrastructure; and responsible for all operations needed to run the physical infrastructure. We focus on smart growth with an emphasis on automation, data-driven engineering, cost‐effectiveness, and environmental sustainability.

We deliver the core infrastructure and foundational technologies for Microsoft's 200+ online businesses including Azure, Office 365, Bing, Xbox Live, Skype, and One Drive.  Our portfolio is built and managed by a team of subject matter experts working 24x7x365 to support services for more than 1 billion customers and 20 million businesses in over 90 countries worldwide.

Within CO+I, the Data Center Incident Management Team (DCIM) is responsible for 24 x 7 x 365 incident management for Microsoft data centers worldwide. Within the DCIM Team, we are seeking a highly motivated and experienced Principal Manager, Incident Management - AMER to join our team. If you are a strategic thinker with a passion for driving business success, we encourage you to apply for this exciting opportunity.  This role will require participation in an on-call rotation, including availability during evenings, weekends, and/or holidays to support business needs

Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.

Responsibilities

* Lead end-to-end incident management and crisis response at scale, orchestrating complex, multi-team mitigation efforts, driving rapid restoration, and ensuring clear, timely communication with stakeholders and leadership.

* Drive service reliability and operational excellence, holding teams accountable to SLOs, improving Time to Detect (TTD) and Time to Mitigate (TTM), and embedding best-in-class incident, problem management, and post-incident review practices.

* Define and execute reliability engineering strategy, advancing telemetry, alerting, automation, and predictive monitoring capabilities to proactively identify issues, reduce noise, and improve system resilience.

* Build and scale cross-organizational partnerships and capabilities, developing deep technical expertise, standardizing processes, and enabling consistent, high-quality incident response across services and regions.

* Lead and develop high-performing teams, fostering a culture of accountability, continuous improvement, and inclusion while coaching engineers and leaders to deliver measurable reliability and customer impact.

* This role will require participation in an on-call rotation, including availability during evenings, weekends, and/or holidays to support business needs

* Embody our culture and values.

Qualifications

Required qualifications

* Bachelor's Degree in Mechanical Engineering, Electrical Engineering, Information Technology, Facilities Management, Aerospace Engineering, or related field AND 6+ years technical experience in critical environment, network engineering, service engineering, systems engineering, or industrial controls OR equivalent experience.

Other Requirements:

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

* Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred Qualifications:

* Master's Degree in Mechanical Engineering, Electrical Engineering, Information Technology, Facilities…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary