More jobs:
Sr Linux Networking Engineer
Job in
San Francisco, San Francisco County, California, 94199, USA
Listed on 2026-02-19
Listing for:
Fal
Full Time
position Listed on 2026-02-19
Job specializations:
-
IT/Tech
Systems Engineer, Network Engineer, Cloud Computing
Job Description & How to Apply Below
You are a seasoned networking engineer who has designed, deployed, and operated high-performance networks fal, our platform orchestrates AI inference workloads across thousands of GPUs spread over multiple data centers and cloud providers. You will own the network layer that ties it all together—ensuring that model traffic, storage I/O, and control-plane communication are fast, reliable, and secure. You think in terms of packets per second, tail latency, and fabric utilization, and you automate everything you touch.
KeyResponsibilities
- Design, build, and operate the network fabric that interconnects our GPU fleet, including spine-leaf architectures, RDMA/RoCEv2 networks for distributed inference, and overlay networks for tenant isolation.
- Own L2/L3 network design across bare-metal and cloud environments, including BGP peering, ECMP, VXLAN/EVPN, and high-bandwidth interconnects between data centers.
- Develop and maintain network automation using Ansible, Terraform, and custom tooling to provision, configure, and validate switches, routers, DPUs, and Smart
NICs at scale. - Instrument deep network observability—build dashboards, alerting, and anomaly detection across our fabric using Prometheus, Grafana, and packet-level telemetry.
- Partner with the Compute and ML Performance teams to tune network paths for AI workloads, minimizing latency for model serving and maximizing throughput for large tensor transfers.
- Drive incident response and root-cause analysis for network-related production issues and build automation to prevent recurrence.
- Evaluate and qualify new networking hardware and software—NICs, switches, DPUs, SONiC, Cumulus, and similar—as we scale to next-generation GPU clusters.
- 8+ years of experience building and operating large-scale networks, ideally in GPU cloud, HPC, or hyperscale environments.
- Deep expertise in Linux networking internals: kernel networking stack, iptables/nftables, tc, eBPF, network name spaces, bonding/teaming, and SR-IOV.
- Strong command of routing and switching protocols: BGP, OSPF, ECMP, VXLAN, EVPN, MPLS, and segment routing.
- Hands-on experience with high-performance networking for AI/ML: RDMA, RoCEv2, Infini Band, GPUDirect, and NCCL tuning.
- Proficiency automating network infrastructure with Ansible, Python, Go, and Git.
- Experience with network-as-code workflows.
- Familiarity with modern network operating systems such as SONiC, Cumulus Linux, Arista EOS, or Nokia SR Linux.
- Experience with network observability stacks:
Prometheus, Grafana, sFlow/Net Flow, and packet capture tools.
- Experience with DPU/Smart
NIC programming (NVIDIA Blue Field, AMD Pensando) and SDN/NFV architectures. - Contributions to open-source networking projects (SONiC, FRR, DPDK, eBPF/XDP).
- Experience operating networks that support Kubernetes and container-native workloads (Calico, Cilium, Metal
LB). - Familiarity with data center physical layer design, optics, and cabling at scale
- Interesting and challenging work
- Competitive salary and equity
- A lot of learning and growth opportunities
- We offer visa sponsorship and will help you relocate to San Francisco.
- Health, dental, and vision insurance (US)
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×