×
Register Here to Apply for Jobs or Post Jobs. X

AI & HPC Network Architect

Job in Atlanta, Fulton County, Georgia, 30383, USA
Listing for: DigiPower X
Full Time position
Listed on 2026-02-08
Job specializations:
  • IT/Tech
    Systems Engineer, Network Engineer
Job Description & How to Apply Below

Network Architect and Engineer - AI Factory & HPC

Location:

Hybrid with travel to active POD sites (in AL, NY, NC) as required

Reports to:

Chief Technology Officer

Role

Summary:

The Network Architect and Engineer owns the design, deployment, and operation of AI factory networking across DGXX PODs. This role is accountable for building and operating high performance, low latency, and resilient network fabrics that support large scale GPU workloads. The scope includes backend GPU fabrics, front end customer connectivity, and management networks. The focus is correctness, performance under load, and predictable behavior at scale.

This role exists to ensure that AI workloads are not constrained by network design, configuration, or operations.

Responsibilities:

  • Own network architecture and design for AI factory PODs and data halls
  • Design and operate backend GPU fabrics supporting collective communication workloads
  • Design and operate front end customer and service connectivity networks
  • Define network standards across switches, optics, cabling, and topology
  • Lead network bring up, validation, and performance testing for new PODs
  • Own congestion management, loss handling, and traffic isolation strategies
  • Troubleshoot network performance, packet loss, and latency issues under load
  • Coordinate with infrastructure, data center operations, and platform teams
  • Own network readiness for expansions, upgrades, and hardware refreshes
  • Maintain network documentation, diagrams, and operational runbooks

Required Experience

  • Ten or more years in network engineering or network architecture roles
  • Hands on experience designing and operating large scale data center networks
  • Strong understanding of AI and HPC networking requirements
  • Experience with Infini Band and or Ethernet based lossless fabrics
  • Experience operating 100G and higher speed network environments
  • Hands on experience with switch configuration, optics, and cabling validation
  • Experience troubleshooting performance issues in production networks

Preferred Experience

  • Experience supporting GPU clusters or AI training workloads
  • Experience with multi fabric designs separating backend, front end, and management traffic
  • Familiarity with network telemetry and performance monitoring tools
  • Experience working in colocation or shared data center environments

Resume Weightage: 50%

Problem Solutioning Weightage: 50%

Problem 1: AI Fabric Design and Failure Reasoning

You are designing the network for an AI factory POD running large scale GPU training workloads. The POD will be deployed in a colocation facility and must support sustained collective communication traffic. Multiple PODs are expected to be interconnected in the future.

Task:

  • Describe the network fabrics you would design and why
  • Explain how you would separate backend GPU traffic from other network traffic
  • Identify the most common failure or degradation modes you expect under load
  • Describe how you would detect and isolate those issues in production
  • Explain how your design supports future expansion without major redesign

What we look for

  • Clear understanding of AI workload communication patterns
  • Practical design choices grounded in operational reality
  • Ability to reason about performance and failure modes
  • Ownership of both design and operational outcomes

Formats:

Strictly one page. Attach separate from resume. Use bullets or short paragraphs. AI assistance is allowed, but the decisions, reasoning, and tradeoffs must be your own.

#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary