More jobs:
Lead Production Support Analyst
Job in
Cedar Rapids, Linn County, Iowa, 52404, USA
Listed on 2026-06-03
Listing for:
Transamerica
Full Time
position Listed on 2026-06-03
Job specializations:
-
IT/Tech
IT Support, Cloud Computing
Job Description & How to Apply Below
Job Description
Responsibilities
- Operational & Production Support Leadership
- Lead day-to-day production support operations for Individual Solutions & WFG applications/services, ensuring high availability, performance, and stability.
- Act as the accountable owner for the production support operating model, including L1/L2/L3 routing, on-call rotations, escalation paths, and SLAs/SLOs.
- Oversee and coach a vendor/contractor support team, ensuring quality execution, clear accountability, and consistent outcomes across shifts/time zones.
- Own application onboarding into production support: ensure runbooks, SOPs, architecture diagrams, support metrics, monitoring/alerting, access, and DR/backup readiness are complete and current.
- Establish operational readiness standards across logging, monitoring, access controls, backup, disaster recovery, and maintenance windows.
- Vendor Management & Service Delivery
- Manage vendor performance (tickets, SLAs, MTTR, quality of RCAs, repeat incidents, documentation hygiene) and drive continuous service improvement.
- Run recurring vendor governance: operational reviews, KPI scorecards, backlog prioritization, and corrective action plans.
- Coordinate with third-party providers for escalations, service requests, planned maintenance, patching, and production changes.
- Incident, Problem & Change Management
- Serve as the primary escalation point for high-severity incidents; lead war rooms/bridge calls and drive timely resolution with strong communication.
- Ensure Root Cause Analysis (RCA) and Post-Incident Reviews (PIRs) are completed with actionable remediation, prevention plans, and measurable follow-through.
- Drive problem management: identify patterns and recurring issues using incident history, logs, and metrics; reduce repeat incidents through permanent fixes.
- Oversee change/release execution to minimize production risk: pre-change validation, approvals, rollback plans, post-release monitoring, and “go/no-go” decision support.
- Ensure adherence to ITSM processes and audit-ready evidence for incident/change/problem workflows.
- Monitoring, Observability & Reliability
- Improve detection and response through dashboards, health checks, distributed tracing/APM, synthetic monitoring, and log correlation.
- Tune alerting to reduce noise and improve signal-to-noise; implement event correlation to prevent alert storms.
- Partner with engineering and platform teams to define/track error (where applicable), and reliability improvements.
- Continuous Improvement, Automation & Incident Reduction
- Proactively identify opportunities for automation (self-healing, auto-remediation, runbook automation, standardized scripts) that reduce toil and improve MTTR.
- Drive operational standardization: repeatable onboarding, consistent runbooks, automated checks, and common monitoring patterns.
- Lead initiatives focused on reducing incident volume, shortening recovery times, improving release quality, and removing manual steps from common procedures.
- Technical Environment:
- Cloud Platforms
- AWS: EC2, Lambda, ECS/EKS, S3, Cloud Front, Route 53, IAM, Cloud Watch, API Gateway, Secrets Manager
- Azure:
Virtual Machines, Azure Functions, App Service, AKS, Entra , Azure Monitor/Log Analytics, Key Vault, API Management, Azure Backup
- Monitoring & Observability
- App Dynamics, Splunk, Prometheus, ELK, Cloud Watch, Azure Monitor, Grafana
- Incident & Event Management
- Service Now (Incident/Problem/Change/Event), Big Panda, JIRA
- Infrastructure, Middleware & Platforms
- Linux/Windows Server fundamentals; networking basics (DNS, routing, LB, firewall rules)
- Middleware/servers (as applicable): NGINX/Apache, Tomcat/Web Logic/JBoss, Kafka/MQ patterns
- CI/CD & Scheduling
- Jenkins/Git Hub Actions/Cloud pipelines (where applicable)
- Control-M/Cron/Airflow (where applicable)
- Security & Access
- IAM/role-based access, certificates, secrets management, key vaults
Qualifications
- 8+ years in production support, IT operations, cloud operations, or SRE/Platform operations, with 3+ years in a lead role (team lead, service owner, or vendor lead).
- Strong knowledge of ITSM/ITIL practices and hands-on experience with Service Now (Inc/Prob/Chg; Event Mgmt preferred).
- Demonstrated ability to lead…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×