Senior Network Operations Engineer
Senior Network Operations Engineer, Core 42, Abu Dhabi - UAE
Core
42, a leader in AI-powered cloud and digital infrastructure, is driving transformative technology solutions globally. Leveraging advanced resources and partnerships, Core
42 empowers clients to harness sovereign AI infrastructure, especially in sectors with stringent regulatory needs. With a mission to redefine digital transformation, we combine sovereign capabilities with scalable, high-performance compute infrastructure, positioning itself at the forefront of AI innovation in the Middle East and beyond.
We are seeking a Senior Engineer – Network Operations to manage and support the network infrastructure underpinning our global high-performance computing (HPC) environments. This role is responsible for ensuring high availability, security, and optimal performance of switches, firewalls, and network fabrics that support large-scale AI and ML workloads across geographically distributed data centers. The ideal candidate brings deep hands‑on experience with enterprise‑grade network technologies, low‑latency HPC fabrics (e.g., Infini Band), and automation of network operations.
Responsibilities- Provide daily operational support of HPC network infrastructure, including Layer 2/3 switches, routers, firewalls, and RDMA-based fabrics (e.g., Infini Band, RoCE), ensuring network performance and reliability.
- Troubleshoot and resolve complex network issues affecting HPC workloads and services, minimizing downtime and maximizing throughput.
- Configure, upgrade, and maintain enterprise‑grade firewalls, VPNs, ACLs, and routing protocols (e.g., BGP, OSPF), ensuring network security and performance.
- Provide network integration support for HPC platforms, including Slurm, Kubernetes, and bare‑metal provisioning systems.
- Design and manage IP address planning, VLAN configurations, network segmentation, and security zones in alignment with operational and compliance requirements.
- Develop and maintain network automation scripts and infrastructure‑as‑code solutions (e.g., Ansible, Python, Terraform) to optimize processes and reduce human error.
- Document network architecture, configurations, runbooks, and change management procedures in accordance with ITIL/ISO standards.
- Participate in on‑call rotations, providing support for incident response, change management, and root cause analysis (RCA) processes.
- Conduct root cause analysis (RCA) for operational network issues, contributing to post‑mortem documentation and driving continuous improvement efforts.
- Provide mentorship and technical guidance to junior engineers, helping to build skills and foster a collaborative environment.
- Bachelor’s degree in Network Engineering, Computer Science, or a related field; or equivalent hands‑on experience.
- Minimum of 5 years of experience in enterprise network operations or engineering roles.
- Extensive hands‑on experience with data center networking equipment (e.g., Cisco, Arista, Juniper, Mellanox, or NVIDIA Networking).
- Deep understanding of Layer 2/3 protocols, the TCP/IP stack, multicast, QoS, and VLAN/VXLAN/EVPN technologies.
- Proficiency in configuring and managing firewalls (e.g., Palo Alto, Fortinet, Cisco ASA) and VPN solutions to ensure secure network operations.
- Proven experience in supporting low‑latency, high‑throughput networks in HPC, AI/ML, or cloud‑scale environments.
- Hands‑on experience with Infini Band or RoCE technologies for HPC network environments.
- Familiarity with Kubernetes networking (e.g., CNI plugins, network policies, service meshes) for cloud‑native networking.
- Exposure to CI/CD, Git, and modern Dev Net practices for automating and optimizing network infrastructure.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).