More jobs:
Job Description & How to Apply Below
Job Summary
We are seeking a highly skilled Infini Band Engineer with strong expertise in advanced networking technologies to design, deploy, and support high-performance, low-latency network infrastructures. The ideal candidate will have hands-on experience with Infini Band fabrics, data center networking, and large-scale distributed computing environments (HPC / AI / ML clusters).
Key Responsibilities
Design, implement, and manage large-scale Infini Band (IB) fabrics in data center and HPC environments.
Configure and troubleshoot Infini Band switches and adapters (e.g., Mellanox / NVIDIA IB platforms).
Perform fabric bring-up, subnet management (OpenSM), partitioning, and performance tuning.
Monitor and optimize network performance, latency, throughput, and congestion control.
Integrate Infini Band with Ethernet-based networking environments.
Support RDMA technologies (RoCE, iWARP) and GPUDirect environments.
Collaborate with system, storage, and compute teams to support AI/ML and distributed workloads.
Perform firmware upgrades, patching, and capacity planning.
Troubleshoot Layer 2 / Layer 3 networking issues (BGP, OSPF, VLAN, VXLAN, etc.).
Maintain documentation, network diagrams, and SOPs.
Required
Skills & Qualifications
5+ years of networking experience with strong fundamentals (TCP/IP, routing, switching).
Hands-on experience with Infini Band technologies (HDR/NDR preferred).
Experience with NVIDIA / Mellanox Technologies switches and adapters.
Strong understanding of RDMA, congestion control, QoS, and low-latency tuning.
Experience with subnet managers (OpenSM) and fabric diagnostic tools.
Solid understanding of BGP, OSPF, EVPN-VXLAN, MPLS (good to have).
Experience in HPC, AI/ML cluster networking environments is highly preferred.
Familiarity with Linux networking and troubleshooting tools.
Experience with automation (Python, Ansible) is a plus.
Preferred Qualifications
Experience supporting large GPU clusters.
Knowledge of storage networking (NVMe-oF, parallel file systems).
Experience with monitoring tools and telemetry systems.
Networking certifications (CCNP/CCIE or equivalent).
Key Competencies
Strong analytical and troubleshooting skills
Ability to work in high-performance, mission-critical environments
Excellent documentation and communication skills
Proactive problem-solving mindset
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here:
×