×
Register Here to Apply for Jobs or Post Jobs. X

Principal AIOps Engineer

Job in Aurora, Kane County, Illinois, 60505, USA
Listing for: Hispanic Alliance for Career Enhancement
Full Time position
Listed on 2026-05-31
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer, IT Support, Cybersecurity
Salary/Wage Range or Industry Benchmark: 100000 - 125000 USD Yearly USD 100000.00 125000.00 YEAR
Job Description & How to Apply Below

We're building a world of health around every individual - shaping a more connected, convenient and compassionate health experience. At CVS Health®, you'll be surrounded by passionate colleagues who care deeply, innovate with purpose, hold ourselves accountable and prioritize safety and quality in everything we do. Join us and be part of something bigger - helping to simplify health care one person, one family and one community at a time.

Position

Summary

We are seeking a Principal AIOps Engineer with deep reliability and operations experience to build and scale intelligent operations across the enterprise. This role focuses on modernizing IT operations through observability, event intelligence, machine learning, and agentic AI-reducing alert noise, accelerating triage, and enabling closed‑loop automation. You will act as a principal‑level advisor and technical leader for building an Agentic AI ecosystem for IT operations, with Service Now as the ITSM system of record (Incident/Problem/Change) and the backbone for auditable workflows, approvals, and measurable outcomes.

What

you will do
  • Lead the AIOps strategy, roadmap, and operating model (intake, triage, automation lifecycle, KPIs) to measurably improve MTTR, alert quality, and operational efficiency
  • Own the observability-to-AIOps pipeline (metrics, logs, traces, events) and drive standardization of telemetry, service health models, and actionable alerting across teams and platforms
  • Design and implement event intelligence: correlation, deduplication, suppression, anomaly detection, incident clustering, and probable-cause analysis using topology/CMDB context
  • Advise operations, service owners, and leadership stakeholders; lead change enablement, adoption, and value measurement for AIOps and agentic automation across the organization
  • Develop Service Now-centric AIOps integrations (ITSM + ITOM/Event Management where applicable): event ingestion, alert-to-incident policies, enrichment, assignment/routing, approvals, change workflows, and closure updates for auditable closed-loop ops
  • Establish governance for operational AI (risk controls, approvals, auditability, data access, prompt/response logging, evaluation, and continuous improvement) in partnership with security, compliance, and operations
  • Build and operationalize agentic AI workflows for incident triage and resolution: signal summarization, similar-incident retrieval, knowledge article drafting, ticket updates, stakeholder communications, and human-in-the-loop remediation
  • Enable closed-loop automation and self-healing by connecting AIOps detections to orchestrated actions (runbooks/workflows), with clear approvals, safety checks, and rollback paths
  • Partner with NOC/SOC, infrastructure, and application owners to onboard services into AIOps, define service models, and improve signal quality, escalation paths, and operational readiness
  • Create enablement materials (playbooks, operating procedures, dashboards) and coach teams on AIOps practices, agentic AI usage, and responsible automation
Required Qualifications
  • 10+ years of experience in SRE, production operations supporting highly available services along with experience with Product model
  • Proven technical leadership: ability to set direction, lead cross-team initiatives, and advise stakeholders through architecture reviews, tradeoffs, and operational readiness
  • Strong programming/scripting skills (Python preferred) and experience building automation, integrations, and APIs
  • Experience integrating observability platforms and event sources across hybrid environments (cloud/on-prem) and operating production-grade monitoring/event management at scale
  • Strong Service Now experience as an ITSM system of record (Incident/Problem/Change; CMDB/asset concepts). Ability to build and operate integrations at scale (REST, webhooks, event management) to support automation and auditability.
  • Automation & Integration Engineering:
    • Python (preferred) for automation and data/ML pipelines; experience building integrations, services, and operational tooling.
    • Workflow orchestration and integrations (Service Now APIs, event pipelines, runbook automation) with strong reliability,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary