×
Register Here to Apply for Jobs or Post Jobs. X

Senior Incident & Automation Engineer; AIOps​/Reliability Vice President

Job in Irving, Dallas County, Texas, 75062, USA
Listing for: Citigroup
Full Time position
Listed on 2026-06-02
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, Cybersecurity, IT Support
Job Description & How to Apply Below
Position: Senior Incident & Automation Engineer (AIOps / Reliability) Vice President
** Position Summary*
* The Senior Incident & Automation Engineer serves as a critical bridge between the Technology Incident Optimization Program and the core Compute, Virtualization, Cloud Services, and Storage technology domains. This role demands deep technical expertise combined with strategic thinking to drive tactical incident reduction while architecting the future state of intelligent event management and automation.

You will be responsible for building automated incident remediation workflows and achieving measurable incident reduction within your domain through event optimization, correlation, and automation while ensuring comprehensive observability is maintained and enhanced. This position offers the unique opportunity to shape the future of enterprise event management.

** Key Responsibilities*
* +  
** Incident & Alert Analysis:
** Conduct comprehensive analysis of alert and incident patterns to identify top sources of operational noise, determine root causes, and develop data-driven strategies for reduction.

+  
** Intelligent Event Management:
** Design, implement, and optimize rules for event correlation, de-duplication, and suppression on AIOps and event management platforms. Develop domain-specific correlation logic leveraging configuration management data and infrastructure topology.

+  
** Automation & Self-Healing:
** Architect and develop automation playbooks for incident data enrichment and create self-healing capabilities for common and recurring infrastructure incident scenarios.

+  
** Observability Enhancement:
** Assess the current observability footprint across all infrastructure domains to identify gaps and propose enhancements that align with enterprise event management standards.

+  
*
* Cross-Functional Collaboration:

** Partner closely with infrastructure operations, engineering, and platform teams to understand incident drivers, validate correlation logic, and provide expert guidance on event management best practices.

+  
** Quality Assurance:
** Continuously validate the effectiveness of implemented rules and automation to ensure no business-impacting alerts are missed. Monitor and report on alert quality metrics and lead iterative improvements.

** Required Qualifications*
* +  
*
* Education:

** Bachelor's degree in Computer Science, Information Technology, Computer Engineering, or a related technical field.

+  
*
* Experience:

** A minimum of 8+ years of hands-on experience in IT operations, infrastructure engineering, or system architecture within large-scale enterprise environments.

+  
** Event Management & Incident Reduction:
** Proven experience and demonstrated success in leading event management and incident reduction initiatives with quantifiable results. Direct, hands-on experience with modern AIOps and event management platforms is required.

+  
** Technical Expertise:*
* + Deep understanding of enterprise infrastructure including virtualization architectures, container orchestration, microservices, and various storage architectures (block, file, object).

+ Expertise with a broad range of domain-specific monitoring tools for compute, virtualization, storage, and cloud platforms.

+  
** Automation & Orchestration:
** Hands-on experience developing robust automation solutions using scripting languages and modern automation frameworks.

+  
** Data Analysis:
** Proficiency in log analysis, pattern recognition, and using query languages for data analysis on log aggregation platforms.

+  
** Problem-Solving & Analytical

Skills:

** Excellent analytical abilities with a systematic approach to troubleshooting complex issues and a holistic view of technology systems.

+  
** Communication & Leadership:
** Exceptional communication skills with the ability to influence and collaborate effectively across diverse, cross-functional teams and present technical concepts to various audiences.

** Preferred Qualifications*
* + An advanced degree (Master's) in a relevant technical field.

+ Relevant industry certifications (e.g., Cloud, Virtualization, Automation, ITIL).

+

Experience with AIOps, machine learning for IT operations, and Site Reliability Engineering (SRE) practices.

+ Knowledge of ITSM platforms, CMDB…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary