Principal GPU/CPU Systems Engineer
Listed on 2026-02-07
-
IT/Tech
Systems Engineer, Hardware Engineer -
Engineering
Systems Engineer, Hardware Engineer
Job Description
Required Qualifications
10 or more years of experience in hardware design, system engineering, and platform bring-up.
Hands-on experience with market-leading GPUs or AI platforms spanning development, bring-up, test, and characterization.
Strong knowledge of AI/GPU and or AI/CPU platform architectures and capabilities.
Experience evaluating system architectures, platform definitions, and implementation paths.
Ability to balance hardware performance, power, cost, regulatory, and cross-functional requirements.
Experience with modern server platforms across x86 and ARM architectures.
Hardware development experience at the system, board, and FPGA levels.
Proficiency reviewing hierarchical schematics, advanced multilayer board layouts, and end-to-end interconnects.
Strong understanding of firmware and system diagnostics using BMC firmware, UEFI or BIOS, and Linux tools.
Experience scripting and customizing diagnostics, validation, and test workflows.
Experience with GPU supplier test code and open-source AI test and characterization tools.
Experience with system integration, validation, and performance characterization.
Strong understanding of high-speed buses and interconnects used in modern AI and compute platforms.
Demonstrated ability to debug and root-cause complex hardware and software issues.
Ability to document design intent and technical specifications clearly.
Strong communication skills with the ability to explain complex technical topics across engineering teams and executive audiences.
Proven ability to provide cross-functional technical leadership and collaborate effectively with internal teams and external partners.
Preferred Skills
Experience using hardware debuggers.
Experience with PCIe, DDR, Ethernet, USB, SPI, and related interfaces.
Experience with platform-level security technologies.
Experience with power circuit design and signal integrity.
Responsibilities
Platform Architecture and DefinitionParticipate in platform definition, architecture evaluation, and analysis for existing and next-generation Cloud AI platforms.
Evaluate system architectures, proposed implementations, and scaling and optimization strategies.
Review and assess third-party merchant silicon used for AI accelerator modules and GPU/CPU platforms.
Balance hardware performance priorities against power, cost, regulatory, and cross-functional requirements.
Drive definition, development, integration, debug, characterization, and tuning of AI hardware platforms.
Provide platform development oversight for internal teams and third-party partners.
Work with in-house engineering experts on design reviews, schematics, board layout, and implementation decisions.
Document and specify design intent and technical details in collaboration with engineering teams.
Guide and support system integration, system test, qualification, and characterization.
Define and oversee system validation plans, diagnostics features, and test strategies.
Develop and expand system characterization and performance testing capabilities.
Utilize supplier-provided and approved open-source AI platform qualification and test tools.
Support definition of in-service system monitoring, error reporting, and operational health visibility.
Collaborate with GPU and AI chip suppliers, system architects, firmware developers, and hardware engineers.
Partner with storage, networking, compute, quality, security, cloud orchestration, and manufacturing teams.
Support development program managers with technical assessments and planning.
Assist manufacturing teams to ensure hardware is secure, robustly evaluated, and production-ready.
Participate in hardware platform security evaluations.
Guide internal teams and partners on scaling, monitoring, and deploying AI platforms into the cloud.
Serve as a senior technical advisor to Oracle hardware, software, cloud, and support teams.
Act as the final level of engineering support for complex deployed product issues.
Assist with root-cause analysis through lab replication, remote debug, and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).