Wireless Reliability Engineer; AP SRE - San Jose, CA
Listed on 2026-06-04
-
IT/Tech
Systems Engineer, Wireless / 5G
Position:
Wireless Reliability Engineer (AP SRE)
Location:San Jose, CA
Mission:Help eliminate bad Wi‑Fi experiences by making Nile’s access point platform measurably more reliable before it reaches production.
Nile delivers Connectivity as a Service for enterprise campuses. Instead of one‑off hardware and manual break/fix testing, we operate a service with strong reliability and security guarantees. This role sits at the intersection of wireless, systems, and software engineering to make that possible.
Role Overview :As a Wireless Reliability Engineer on the AP SRE team, you will own the reliability of Nile’s access point platform across performance, correctness, and security, primarily in pre‑production environments. You will:
- Design and evolve the automation, validation, and chaos frameworks that exercise our APs in CI/CD and in the lab.
- Drive deep L1–L7 investigations when complex wireless issues appear and convert those learnings into durable tests and platform changes.
- Partner closely with firmware, cloud SRE, and security to ensure reliability is engineered in, not bolted on.
This is an individual contributor role at Senior / Staff level, with high technical ownership and visibility.
What You’ll DoBuild the Machine That Tests the Machine
- Architect Python-based automation scripts that integrate into CI/CD to validate AP firmware, drivers, and wireless features at scale.
- Continuously increase automated coverage of functional, performance, and longevity tests for Wi‑Fi features (11ax/11be), roaming, QoS, and management‑plane behavior.
- Define test strategy and guardrails: what must be covered on every change, on nightly runs, and on long‑running stress suites.
- Design and run chaos and stress scenarios against Wi‑Fi and datapath stacks: RF impairments, load patterns, roaming storms, congestion, and failure injections.
- Characterize and harden the system under realistic client and traffic mixes using tools like IxChariot/IxANVL, Spirent, or Alethea (or equivalent).
- Turn intermittent or “rare” failures into reproducible automated tests that block regressions.
- Validate 802.1X, WPA3, and Zero Trust campus behavior, including onboarding flows, policy enforcement, and failure/attack scenarios.
- Work with security engineering to translate threat models into repeatable test plans and automation (e.g., auth storms, misbehaving supplicants, rogue AP/client scenarios).
- Use silicon‑level telemetry and debug hooks on Qualcomm (or similar) AP SoCs to understand RF performance, power, and error behavior under load.
- Collaborate with silicon/firmware teams to correlate lab findings with driver/firmware changes and influence roadmap and design decisions.
- Lead deep technical investigations across L1–L7 when Nile or customer scenarios expose weird or hard‑to‑reproduce behavior.
- Produce clear RCAs that tie symptoms to root cause and result in concrete fixes in firmware, cloud controllers, or test systems.
- Feed every critical RCA back into automation, metrics, or specifications so the same class of issue does not recur.
- Collaboration:
Work day‑to‑day with AP firmware, wireless systems, cloud SRE, security, and product teams. You will often be the bridge between RF realities, protocol behavior, and software implementation. - Scope:
Primary focus is pre‑production reliability and validation. You will also engage with selected high‑severity production incidents when deep wireless expertise is needed and then codify those learnings back into tests. - On‑call:
This role is not a traditional 24×7 production on‑call rotation, but you may be pulled into critical incident investigations where AP or wireless expertise is required.
- Design and roll out a next‑generation AP reliability test scripts in CI/CD for AP features that you own.
- Build and stabilize a set of chaos/stress scenarios that uncover new issues in Wi‑Fi performance, roaming, and security under load.
- Lead multiple complex RCAs on wireless issues (lab or field) that directly result in:
- Firmware or SoC…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).