Principal Networking Engineer - QoS/Networking
Listed on 2026-02-23
-
IT/Tech
Systems Engineer, IT Support, Cybersecurity
WHAT YOU DO AT AMD CHANGES EVERYTHING
At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture.
We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond. Together, we advance your career.
We are seeking a hands-on Principal Networking Engineer to own end-to-end QoS strategy and implementation across data center Smart
NICs/DPUs. You will define traffic classification, shaping, scheduling, and congestion control policies spanning Top-of-Rack (ToR)/leaf/spine switches and host offload (Smart
NIC/DPU), ensuring predictable performance for AI/ML, storage, and latency-sensitive services. The ideal candidate combines deep knowledge of L2/L3/L4 QoS, RDMA/RoCE, PFC/ETS/ECN, and switch silicon schedulers/queues, with practical experience deploying policies at fleet scale.
We are seeking an experienced Principal Networking Engineer to drive the continuation of existing and future software systems and products. The successful candidate will be responsible for ensuring the functionality, reliability, and performance of our software products while keeping an outlook for future enabling and related technology. The ideal candidate will have a strong background in software engineering, excellent technical skills, and communication skill.
KEY RESPONSIBILITIES:- Own QoS architecture across network tiers (host → NIC/DPU including classification, policing, shaping, queue mapping, and scheduling strategies for mixed workloads (AI collectives, storage, RPC, control plane).
- Design and implement Smart
NIC QoS: map DSCP/PCP to NIC traffic classes, configure hardware TX/RX queues, rate limiters, WFQ/DRR schedulers, and offload paths for RDMA/TCP/UDP. - Switch QoS policy design: configure PFC, ETS, ECN/RED/WRED, buffer pools, queue thresholds, shared vs. dedicated buffers, and congestion control across multiple ASICs (e.g., Broadcom, NVIDIA/Mellanox, Marvell).
- RDMA/RoCE tuning end-to-end: lossless/loss-tolerant modes, CNP/ECN parameters, RNR/retry behavior, MTU/Jumbo frames, and scalable multi-tenant profiles.
- Performance engineering: build test plans and run micro/macro benchmarks (e.g., _lat/_bw, RCCL/NCCL, iperf, switch counters/telemetry) to validate latency, throughput, tail performance, and fairness.
- Instrumentation & observability: define SLI/SLOs for QoS (tail latency, drops, PFC events, ECN marks, queue depth, buffer occupancy); integrate with streaming telemetry (gNMI/INT/SFlow) and develop dashboards and alerts.
- Troubleshoot complex incidents: incast, PFC deadlocks, microbursts, head-of-line blocking, unfair scheduling, and noisy neighbors; lead root-cause analysis and corrective actions.
- Scale & automation: deliver declarative QoS via intent-based configs and CI/CD (e.g., Ansible/Salt, NAPALM, gNMI/gNOI, Netconf/YANG), including pre-deployment simulation and automated canary/rollback.
- Documentation & standards: author design docs, runbooks, and guidance for tenant teams; contribute to internal standards and vendor requirements.
- Strong experience datacenter networking or systems engineering, with direct ownership of QoS on switches and/or Smart
NICs/DPUs. - Deep knowledge of QoS mechanisms: classification/marking (DSCP/PCP), policing, shaping, queueing (PRIO, WRR/WFQ/DRR), scheduling hierarchies, and buffer management.
- Hands-on with PFC, ETS, ECN/WRED, explicit buffer tuning, and RDMA/RoCE performance/correctness in production.
- Experience configuring merchant switch silicon (e.g., Broadcom Trident/Tomahawk, NVIDIA Spectrum, Marvell Teralynx) via NOS CLIs/SDKs (e.g., SONiC, Cumulus, NX-OS, EOS, Onyx).
- Smart
NIC/DPU experience (e.g., NVIDIA Blue Field, Intel…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).