Site Reliability Engineer - Network Engineer; Onsite - Seattle,WA Job Seattle area,Washington USA,IT/Tech

Position: Site Reliability Engineer 1 - Network Engineer (Onsite - Seattle, WA)

Job Description

Nordstrom Technology seeks an exceptional Site Reliability Engineer with deep networkingexpertiseto join ourNordstrom

Operations Center (NOC) team.

You'll maintain"eyes on glass" monitoring of application services and critical network infrastructure, ensuring the health and reliability of Nordstrom's retail operations. This role combines proactive monitoring, incident response, and root cause analysis with advanced network troubleshooting—diagnosing complex issues spanning the full stack and driving resolution of P1/P2 incidents that impact business operations.

This role is offered asonsite in Seattle, WA supporting Nordstrom's 24/7 NOC. Candidates must be available to work in office at the Nordstrom corporate headquarters 5 days/week with shifts starting at 6:00 AM PST, including one weekend day per week (Saturday or Sunday) as part of regular rotation.

A day inthe life...Monitoring & Incident Response

Maintain real-time monitoring across application services, network infrastructure, and business KPIs (site visitors, order flow, revenue-impacting metrics)
Participate in 24/7 on-call rotations, responding to Pager Duty alerts and managing incidents through Service Now workflows
Lead P1/P2 incident troubleshooting, coordinating with engineering teams and vendors to restore service rapidly
Perform real-time network diagnostics and performance testing during active incidents

Network Operations

Monitor and troubleshoot routers, switches, firewalls, load balancers, wireless systems, and SD-WAN solutions
Analyze network performance,identifybottlenecks, and recommend optimization strategies
Investigate connectivity issues, VLAN configurations, routing problems, and security events
Coordinate with network engineering during changes, maintenance windows, and infrastructure upgrades
Maintain visibility into multi-vendor cloud environments (AWS, Azure) and cloud networking architectures

Root Cause Analysis & Continuous Improvement

Conduct deep technical investigations focusing on credential expirations, service account failures, authentication incidents, and cascading failures
Document findings in detailed RCA reports with actionable remediation steps
Build and refine monitoring dashboards to improve Mean Time to Detect (MTTD) and Mean Time to Mitigate (MTTM)

AI-Driven Operations & Automation

Contribute to AI-driven incident detection and automated response initiatives, building autonomous monitoring and remediation capabilities
Develop scripts and automation to remediate common incidents, reduce manual toil, and accelerate response workflows
Create automated health checks and build integrations between monitoring platforms (New Relic, Pager Duty, Service Now, Jira)

Observability & Reliability

Enhance monitoring, logging, and alerting using New Relic or similar platforms
Track operational metrics (MTTD, MTTM, incident trends) and build executive-level dashboards
Support SLO/SLI definition and tracking for critical services and network infrastructure
Collaborate with teams to improve fault tolerance, redundancy, and disaster recovery

Collaboration & Leadership

Work closely with software engineering, infrastructure, and network teams to improve operational readiness
Communicate effectively with stakeholders at all levels during incidents and post-incident reviews
Contribute to NOC optimization including shift scheduling and process improvements

You own this if you have...Required Technical Skills Networking Expertise

Strong understanding of TCP/IP, OSI model, routing protocols (BGP, OSPF), and switching technologies
Experience troubleshooting network connectivity, packet loss, latency, and performance issues
Proficiency with network monitoring tools and packet analysis (Wireshark,tcpdump, Net Flow/sFlow)
Knowledge of DNS, DHCP, VLANs, VPNs, firewalls, load balancers, and network security
Hands-on experience with MISTorAruba wireless management or similar enterprise wireless platforms
Deep understandingwith

Juniper Networks routing and switching platforms

SRE & Infrastructure

1-3+ years in site reliability engineering, NOC operations, or similar roles (flexible based on networking depth)
Proficiency with New Relic or similar enterprise monitoring platforms
Strong cloud platform experience (AWS, Azure) and cloud networking concepts
Hands-on containerization and orchestration experience (Docker, Kubernetes/NSK)
Familiarity with Kafka streaming platforms and CI/CD pipelines

Programming & Automation

Proficiency in Python, Go, Bash, or Power Shell for automation and troubleshooting
Experience with REST APIs and system integrations

Operational Excellence

Proven track recordmanaging P1/P2 incidents in 24/7 production environments
Experience with Pager Duty, Service Now, and Jira
Strong analytical skills diagnosing complex, multi-layered technical issues under pressure
Root cause analysis experience with detailed technical documentation

Preferred Qualifications

Bachelor's degree in computer science, engineering, networking, or equivalent degree
Network or Security…


Increase/decrease your Search Radius (miles)



Job Posting Language

Site Reliability Engineer - Network Engineer; Onsite - Seattle, WA