HPC Network Solutions Architect
Listed on 2025-12-19
-
IT/Tech
Systems Engineer, Cloud Computing, Network Engineer, Data Engineer
Title: HPC Network Solutions Architect
Location: Dallas, TX (HYBRID, Relocation Assistance Provided)
Duration: Permanent Direct Hire
Compensation: $200K - $325K base salary, plus bonus
Work Requirements: US Citizen, GC Holders or Authorized to Work in the U.S.
HPC Network Solutions Architect
As an HPC Network Solutions Architect, you will design, integrate, and optimize high-performance networking architectures that form the backbone of HPC, AI/ML, and data-intensive workloads. You will act as a trusted advisor to customers, guiding them across the entire solution lifecycle — from requirements gathering and design, through proof-of-concept and deployment, to optimization and long-term adoption.
This is a customer-facing, technically focused role. You will collaborate closely with customers to align low-latency, high-bandwidth networking designs with their workload requirements, while also working with internal engineering and product teams to influence roadmap priorities.
Your role will bridge the gap between cutting-edge networking technologies (Infini Band, RoCE, EVPN, VXLAN) and real-world HPC adoption at scale.
This position offers the opportunity to shape the future of HPC networking, deliver measurable impact for customers, and influence vendor ecosystems by incorporating emerging innovations into enterprise-ready solutions.
- Act as the primary networking SME for customers adopting or scaling HPC environments.
- Partner with customers to capture network performance goals, scalability requirements, and integration constraints.
- Design and document end-to-end HPC network architectures, including Ethernet, Infini Band, RoCE, EVPN, and VXLAN fabrics.
- Lead proof-of-concept and benchmarking engagements, validating low-latency and high-throughput designs against workload requirements.
- Optimize multi-vendor, multi-protocol data center and HPC interconnects, addressing scaling challenges such as data gravity and throughput bottlenecks.
- Define integration strategies across compute, storage, orchestration, and security layers to deliver resilient, workload-aware solutions.
- Conduct network performance assessments and tuning, identifying bottlenecks and recommending enhancements.
- Build observability frameworks for HPC networks at scale using tools like Prometheus, Grafana, and vendor telemetry.
- Collaborate with engineering, product, and operations teams to refine architecture blueprints and ensure consistent delivery.
- Partner with ecosystem vendors (e.g., NVIDIA, Mellanox, Cisco, Arista) to integrate cutting-edge features and influence roadmap evolution.
- Stay current with emerging HPC networking technologies and protocols, providing future insight to customers on adoption strategies.
- Represent the organization at customer design sessions, workshops, and industry events, building strong technical relationships.
- Demonstrated experience in HPC networking solution architecture, systems design, or data center network engineering.
- Strong expertise in Infini Band and RoCE protocols, including deployment and tuning at scale.
- Hands‑on experience designing and implementing large‑scale Ethernet networks, including BGP, OSPF, EVPN, and VXLAN.
- Deep understanding of GPU communication frameworks such as MPI and NCCL, and their integration with HPC interconnects.
- Proficiency with Linux‑based environments and scripting (e.g., Python, Bash, Power Shell) for automation.
- Experience supporting multi‑vendor environments and evaluating new networking platforms.
- Ability to translate complex networking requirements into clear solution architectures and present them effectively to customers.
- Strong customer‑facing communication skills, including the ability to engage executives and technical stakeholders alike.
- Experience delivering HPC or AI/ML workloads across large‑scale, low‑latency network infrastructures.
- Familiarity with CNI plugins (Multus, Cilium, NVIDIA CNI) for HPC/Kubernetes environments.
- Exposure to automation and infrastructure‑as‑code practices for network provisioning (Terraform, Ansible).
- Experience in vendor collaboration, including influencing feature roadmaps and participating in joint…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).