Network System Design Engineer - Data Center GPU
Listed on 2026-02-28
-
IT/Tech
Systems Engineer, IT Support, Hardware Engineer
Overview
WHAT YOU DO AT AMD CHANGES EVERYTHING At AMD, our mission is to build great products that accelerate next-generation computing experiences—from AI and data centers, to PCs, gaming and embedded systems. Grounded in a culture of innovation and collaboration, we believe real progress comes from bold ideas, human ingenuity and a shared passion to create something extraordinary. When you join AMD, you’ll discover the real differentiator is our culture.
We push the limits of innovation to solve the world’s most important challenges—striving for execution excellence, while being direct, humble, collaborative, and inclusive of diverse perspectives. Join us as we shape the future of AI and beyond.
Together, we advance your career.
The RoleWe are seeking an engineer to join our team that will thrive in a fast-paced work environment, using effective communication, problem-solving and prioritization skills. Individuals that are well organized, show great attention to detail, and employ critical thinking are well-suited for our team.
The Datacenter Graphics and Accelerated Computing (DCGPU) organization is looking for an experienced network system level debug engineer focused on Datacenter environments. The role involves driving weekly production level parts through specific validation that includes stress testing, Technical Data Package verification (clocks, frequency, power), and BOM/EC verification in various network configurations. The individual will need to drive to root cause closure on issues and communicate with the different IP layers for resolution.
The PersonThis AMD team is looking for a senior level person who can guide the team, mentor upcoming developers, provide long-range strategy, and help resolve issues quickly. You will be involved in performance, automation, and development. The right candidate will stay informed on the latest trends and provide consultative direction to senior management. The person should be experienced in debugging complex hardware/firmware issues, understand the flow of a GPU through the different layers of an SOC and system, and communicate effectively with owners of the code stack to drive issues to resolution.
Key Responsibilities- A strong desire to learn new skills and understand new features as they are added
- Proven ability to work within and across groups
- Effective communication skills
- Identify opportunities to improve the product
- Collaborate with team members to understand design architecture and propose solutions
- Debug/triage engineer for a new quality initiative
- Understand GPU/System level HW and SW flow
- Provide leadership for driving to root cause issues / bugs
- Document flows and methods of debugability
- Embedded coding for hardware components and drivers for network components
- Assist with network prototypes and in-depth testing to validate the design
- Define platform level validation test plans based on product/customer needs
- Troubleshoot and resolve platform network issues
- Provide customer support regarding network architectural questions, prerequisites, and product features
- Interface with networking partners and software/hardware engineers
- Work with software developers on network performance enhancement
- Exposure to systems architecture
- Minimum 10 years experience in system or SOC level debug and triage
- Proven ability to drive resolution of critical problems within a lab or Datacenter
- Experience with external customers/partners and resolving problems in Data Centers
- Experience with manufacturing issues/failures and coordinating with external partners
- 8+ years working with network technologies in Datacenter environments
- Experience with modern networking standards
- Experience with mesh routing and switching protocols
- Familiar with Ethernet and Infini Band designs and switch topologies
- Linux as a development environment
- Familiar with Ethernet and Infini Band networking in Linux and Windows
- Familiar with Virtualization (KVM and Hyper-V)
- RDMA network configuration and troubleshooting
- Linux kernel networking expertise
- System/Platform level debugging tools
- Experience with HPC/ML/DL workloads networking environments
- Hands-on experience with lab equipment…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).