Manager, IT Service Delivery; AI/ML Platform Services
Listed on 2026-06-26
-
IT/Tech
IT Project Manager, Cloud Computing: Infrastructure & Operations, Systems Administrator, SRE/Site Reliability
Location: New Orleans
Manager, IT Service Delivery (AI/ML Platform Services)
Location:
New Orleans, Louisiana, United States
Company:
Entergy
Job Title:
Manager, IT Service Delivery (AI/ML Platform Services)
Work Place Flexibility:
Hybrid
Legal Entity:
Entergy Services, LLC
*** The referred location for position is New Orleans, The Woodlands, Texas will be considered ***
Job SummaryThe Manager, AI/ML Platform Services is a key technical leadership role within the IT Platform organization, providing support to the AI organization and other internal users of the AI/ML platform. This position is responsible for leading a team of platform engineers and software engineers who maintain, operate, and enhance the enterprise AI/ML platform infrastructure and own the production environment for products running on the platform.
The manager ensures the platform is reliable, predictable, cost effective and secure while supporting the AI products built by the AI organization.
Operating within a traditional IT support model, the manager will establish and maintain effective handoff processes with the AI organization, ensuring seamless transition of AI products from development to operational support. The manager will collaborate with AI product teams to understand platform requirements and ensure the infrastructure meets the needs of AI solutions deployed across the enterprise.
The primary responsibility of the Manager, AI/ML Platform Team is to lead and develop a team of engineers focused on platform operations, maintenance, and continuous improvement. The manager will ensure service level agreements are met, incidents are resolved efficiently, and the platform remains stable and secure. This role requires strong technical expertise in cloud infrastructure and AI/ML platforms combined with excellent operational and people management skills.
Job Duties & ResponsibilitiesOversee the day-to-day operations of the AI/ML platform, ensuring high availability, reliability, and performance of all platform services. Monitor platform health, manage capacity planning, and ensure systems meet established service level agreements. Coordinate scheduled maintenance windows and platform updates with minimal disruption to AI product operations.
Establish and manage incident response processes for platform-related issues. Define escalation paths and ensure timely resolution of support tickets. Maintain on-call rotations and ensure 24/7 coverage for critical platform issues. Track and report on support metrics including response times, resolution rates, and customer satisfaction.
Lead, mentor, and develop a team of platform and software engineers. Foster a culture of operational excellence, continuous learning, and customer service. Conduct regular performance reviews, identify training needs, and support career development for team members. Build a collaborative team environment that values quality and responsiveness.
Serve as the primary point of contact between the IT Platform organization and the AI organization for platform-related matters. Participate in AI product planning discussions to understand upcoming platform requirements. Establish and maintain effective handoff processes for transitioning AI products from development to operational support.
Monitor and optimize cloud infrastructure costs associated with the AI/ML platform. Implement cost controls and resource optimization strategies. Provide regular reporting on cloud spend and identify opportunities for cost reduction while maintaining platform performance and reliability.
Ensure the AI/ML platform meets security requirements and compliance standards. Collaborate with IT security, internal audit, and cyber security teams to maintain security controls. Support security audits and implement remediation actions as needed. Ensure platform configurations align with enterprise security policies.
Maintain comprehensive documentation for platform configurations, operational procedures, and troubleshooting guides. Ensure knowledge transfer processes are in place for onboarding new team members and supporting AI organization users. Create and maintain runbooks for common operational tasks and incident response.
Identify and implement improvements to platform stability, performance, and operational efficiency. Automate routine operational tasks where possible. Stay current with platform technologies and recommend upgrades or enhancements that improve supportability and reliability.
Bachelor's Degree in Computer Science, Information Systems, or related technical field. Master's degree preferred.
Minimum Experience7+ years of experience in IT infrastructure, cloud platforms, or platform engineering.
3+ years of experience leading and mentoring technical teams.
Experience…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).