Principal AI Network Architect
Listed on 2025-12-02
-
Engineering
Systems Engineer, Hardware Engineer
Principal AI Network Architect – Microsoft
Microsoft Silicon, Cloud Hardware, and Infrastructure Engineering (SCHIE) powers Microsoft’s “Intelligent Cloud” mission by delivering core infrastructure across more than 200 online businesses. SCHIE focuses on smart growth, high efficiency, and a trusted experience worldwide, and is looking for passionate engineers to help achieve that mission.
As Microsoft’s cloud business expands, the Cloud Hardware Systems Engineering (CHSE) team plays an instrumental role in defining and delivering operational measures for hardware manufacturing, improving planning, quality, delivery, scale, and sustainability. The team seeks seasoned engineers to innovate and optimize cloud infrastructure.
We are looking for a Principal AI Network Architect to join the team.
Responsibilities- Technology Leadership – Spearhead architectural definition and innovation for next‑generation GPU and AI accelerator platforms, with a focus on ultra‑high bandwidth, low‑latency backend networks. Drive system‑level integration across compute, storage, and interconnect domains to support scalable AI training workloads.
- Cross‑Functional Collaboration – Partner with silicon, firmware, and datacenter engineering teams to co‑design infrastructure that meets performance, reliability, and deployment goals. Influence platform decisions across rack, chassis, and pod‑level implementations.
- Technology Partnerships – Cultivate deep technical relationships with silicon vendors, optics suppliers, and switch fabric providers to co‑develop differentiated solutions. Represent Microsoft in joint architecture forums and technical workshops.
- Architectural Clarity – Evaluate and articulate tradeoffs across electrical, mechanical, thermal, and signal integrity domains. Frame decisions in terms of TCO, performance, scalability, and deployment risk. Lead design reviews and contribute to PRDs and system specifications.
- Industry Influence – Shape the direction of hyperscale AI infrastructure by engaging with standards bodies (e.g., IEEE 802.3), influencing component roadmaps, and driving adoption of novel interconnect protocols and topologies.
- Bachelor’s Degree in Electrical Engineering, Computer Engineering, Mechanical Engineering, or related field AND 8+ years technical engineering experience OR Master’s Degree in the same fields AND 7+ years technical engineering experience OR equivalent experience.
- 5+ years of experience in designing AI backend networks and integrating them into large‑scale GPU systems.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check:
This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
- Proven expertise in system architecture across compute, networking, and accelerator domains.
- Deep understanding of RDMA protocols (RoCE, Infini Band), congestion control (DCQCN), and Layer 2/3 routing.
- Experience with optical interconnects (e.g., PSM, WDM), link budget analysis, and transceiver integration.
- Familiarity with signal integrity modeling, link training, and physical layer optimization.
- Experience architecting backend networks for AI training and inference workloads, including Hamiltonian cycle traffic and collective operations (e.g., all‑reduce, all‑gather).
- Hands‑on design of high‑radix switches (≥400
Gbps per port), orthogonal chassis, and cabled backplanes. - Knowledge of chip‑to‑chip and chip‑to‑module interfaces, including error correction and equalization techniques.
- Experience with custom NIC IPs and transport layers for secure, reliable packet delivery.
- Familiarity with AI model execution pipelines and their impact on pod‑level network design and latency SLAs.
- Prior contributions to hyperscale deployments or cloud‑scale AI infrastructure programs.
Hardware Engineering IC5 – The typical base pay range for this role across the U.S. is USD $139,900 – $274,800 per year. For the San…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).