More jobs:
XIT Infrastructure & Incident Specialist
Job in
Redmond, King County, Washington, 98052, USA
Listed on 2026-05-29
Listing for:
Net2Source (N2S)
Full Time
position Listed on 2026-05-29
Job specializations:
-
IT/Tech
IT Support, Systems Engineer
Job Description & How to Apply Below
Location:
Redmond/ WA, Local onsite; 24x7 rotational shifts (including weekends and on-call support) Local only
Shift Requirement
• 24x7 rotational shifts (including weekends and on-call support)
Role Overview
Responsible for 24x7 monitoring, incident management, and operational support of a large-scale hybrid infrastructure. The role ensures high availability, performance, and reliability across Production, DR, and Non-Production environments.
Key Responsibilities
Infrastructure Monitoring & Operations
- Monitor 1200+ servers (Windows/Linux), virtualization platforms (VMware, Nutanix), and web servers
- Oversee PB-scale storage systems (Quantum, Isilon, NAS, SAN)
- Monitor 1200+ network devices including switches, routers, firewalls, VPNs, WAPs, and ISP circuits
- Handle incidents and service requests related to infrastructure and tools
- Perform L1/L2 triage for alerts, incidents, and outages
- Ensure timely resolution and escalation as per SLAs
- Correlate alerts across tools to identify root causes
Application & Service Monitoring
- Track service health and dependencies (web, middleware, backend)
Capacity & Performance Management
- Monitor utilization trends across compute, storage, and network
- Identify bottlenecks and recommend optimizations
Change & Release Support
- Support deployments, patching, and maintenance
- Validate system health before and after changes
Disaster Recovery & Resilience
- Support DR readiness and failover validation
- Participate in DR drills
Reporting & Documentation
- Maintain dashboards, runbooks, and reports
- Provide daily/weekly health and SLA updates
Required Skills
Technical Skills
- Networking: TCP/IP, DNS, VPN, Firewalls, Load Balancers (F5)
- Monitoring tools:
New Relic, Splunk, Nagios, Zabbix, Dynatrace, SCOM - ITSM tools:
Service Now (preferred) - Backup tools:
Rubrik
Operational Skills
- Strong incident management in 24x7 environments
- Troubleshooting and analytical skills
- Ability to correlate infra, network, and application issues
- Strong communication and coordination
- Ability to work under pressure
- Documentation and reporting skills
Preferred Qualifications
- ITIL Foundation Certification
- Experience in large enterprise or MSP environments
- Exposure to AWS/Azure (preferred)
- Process flows
- Knowledge transfer and mentoring
- Contribution to project deliverables
- Data conversion and maintenance
- Industry best practices and innovative solutions
- Technical configuration and development support
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×