More jobs:
Monitoring Engineering Production Services Specialist ll
Job in
Chandler, Maricopa County, Arizona, 85249, USA
Listed on 2026-06-23
Listing for:
Bank of America
Full Time
position Listed on 2026-06-23
Job specializations:
-
IT/Tech
SRE/Site Reliability, IT Support
Job Description & How to Apply Below
Job Description
This role provides support to end users and handles incidents and problem management for multiple applications. The primary focus is on triage activities for all business‑impacting incidents.
Responsibilities- Leads production support triage efforts
- Manages bridge line troubleshooting
- Engages in technical research and escalates issues to leadership as needed
- Ensures all impacts are accurately recorded and documented in the system of record
- Verifies that documents and wikis are updated and available for use during triage
- Supports on‑call responsibilities for incidents
- Documents application flows, impacts during outages, customer experience, and contacts for support needs
- Provides status updates and technical detail for awareness communications (infrastructure, application and client impact, component points of failure)
- Ensures the accuracy of all communications sent and schedules necessary reconvenes
- Identifies business impact, interprets monitors, dashboards, and logs, and writes queries to quantify and communicate impacts to leadership
- Promotes and enforces production governance during triage/testing, identifies production failure scenarios, vulnerabilities, and improvement opportunities, and escalates issues as needed
- Analyzes, manages, and coordinates incident management activities to detect problems that affect service levels
- Fulfills research requests, ad hoc reports, and offline incidents at the direction of senior team members or the Technology/Production Services teams
- Hands‑on experience with Splunk (search, SPL, dashboards, alerts, data onboarding, and tuning)
- Hands‑on experience with Dynatrace (APM, services/entities, alerting profiles, management zones, dashboards)
- Strong understanding of monitoring and observability concepts: logs, metrics, traces, events, and correlation
- Experience supporting production systems and incident management and operational support
- Knowledge of SRE concepts such as reliability engineering, alert hygiene, post‑incident reviews, and automation
- Experience working with ITSM processes (incident, problem, change) and tracking SI actions to closure
- Basic to intermediate scripting experience (e.g., Python, Shell) for automation and analysis
- Strong communication skills and ability to work across distributed teams in the APAC region
- Experience with advanced Splunk or Dynatrace features (custom metrics, anomaly detection, DQL/SPL optimization, synthetic monitoring)
- Experience integrating monitoring tools with Service Now or similar ITSM platforms
- Familiarity with capacity monitoring, performance engineering, or business transaction monitoring
- Relevant certifications (Splunk, Dynatrace, SRE/Dev Ops, Cloud) are a plus
- Adaptability
- Analytical Thinking
- Influence
- Production Support Risk Management
- Automation
- Collaboration
- Innovative Thinking
- Result Orientation
- Solution Design
- Business Acumen
- Dev Ops Practices
- Project Management
- Solution Delivery
- Process
- Stakeholder Management
Shift: 1st shift (United States of America)
Hours per week: 40
#J-18808-LjbffrTo View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×