NOC Engineer/NOC Analyst
Listed on 2026-05-03
-
IT/Tech
IT Support, Systems Engineer, Cloud Computing, Cybersecurity
Location
Redmond/ WA, Local onsite; 24x7 rotational shifts (including weekends and on-call support) M-Sun 5a-5p PT
Shift Requirement24x7 rotational shifts (including weekends and on-call support)
Role/SummaryResponsible for 24x7 monitoring, incident management, and operational support of a large-scale hybrid infrastructure including servers, virtualization platforms, storage systems, network devices, and applications. Ensure high availability, performance, and reliability across all environments (Prod, DR, Non-Prod).
Must Have SkillsTechnical
Skills:
- Strong knowledge of:
- Windows & Linux server administration (basic troubleshooting L1 and L1.5)
- Storage systems: SAN/NAS, Isilon, Quantum or similar PB-scale storage
- Networking fundamentals: TCP/IP, DNS, VPN, Firewalls, Load Balancers (F5) (L1 and L1.5)
- Experience with monitoring tools (New Relic, Splunk Nagios, Zabbix, Dynatrace, SCOM, etc.)
- Understanding of ITSM tools (Service Now preferred) for incident, change, and problem management. Rubrik backup management tool.
Operational
Skills:
- Incident management and escalation handling in 24x7 environments
- Strong troubleshooting and analytical skills
- Ability to correlate infrastructure, network, and application issues
- Strong communication and coordination skills
- Ability to work under pressure in critical outage scenarios
- Good documentation and reporting skills
- Experience in large-scale enterprise or MSP environments
- Exposure to cloud or hybrid environments (AWS/Azure) is a plus.
Infrastructure Monitoring & Operations
- Monitor ~1200 + servers (Windows/Linux), virtualization platforms (VMware, Nutanix), and web servers for performance and availability.
- Oversee storage systems (PB-scale: Quantum, Isilon, NAS, SAN) ensuring uptime and capacity health
- Monitor network infrastructure (1200+ devices) includes switches, routers, firewalls, VPN tunnels, WAPs, and ISP circuits.
- Monitor and action on the incidents, requests related to the Infra and tools hosted in the environment.
- Perform L1/L2 triage for alerts, incidents, and outages across infrastructure and applications
- Ensure timely incident resolution, escalation, and communication as per SLAs
- Correlate alerts across tools to identify root causes and reduce noise
Application & Service Monitoring
- Track service health, availability, and dependencies (web, middleware, backend systems)
Capacity & Performance Management
- Track utilization trends across computing, storage (multi-PB), and network
- Proactively identify bottlenecks and recommend optimization
Change & Release Support
- Support infrastructure and application deployments, patches, and maintenance activities
Disaster Recovery & Resilience
- Support DR readiness for large-scale storage and application environments
- Participate in DR drills and failover validation
Reporting & Documentation
- Maintain operational dashboards, runbooks, and incident reports
- Provide daily/weekly health and SLA reports
) | Office: EXT: 444
270 Davidson Ave, Suite 704, Somerset, NJ 08873, USA
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).