Senior Hardware Systems Engineer
Listed on 2026-02-23
-
Engineering
Systems Engineer, Hardware Engineer
Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.
Be a part of the AI revolution with sustainable technology e, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.
About This RoleWe are seeking a Hardware Production / Sustaining Engineer to strengthen Crusoe’s Hardware Systems Engineering team and close critical skill gaps in debugging, validation, and production support of high‑performance compute systems. In this role, you will take ownership of the full hardware lifecycle—from prototype bring‑up to large‑scale production—while driving automation, deep issue resolution, and reliability across Crusoe Cloud’s GPU‑ and CPU‑based infrastructure.
You will work closely with cross‑functional teams to support, debug, and improve hardware platforms at scale, with a particular focus on PCIe, Infini Band, and NVMe/storage, which have been identified as essential areas for deeper expertise. Your work will directly impact Crusoe’s ability to deploy and operate sustainable, AI‑first compute systems with world‑class performance and reliability.
What You’ll Be Working On- Drive the full hardware development and sustaining lifecycle, including feasibility, bring‑up, validation, deployment, and ongoing production support.
- Develop and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability improvements.
- Lead deep troubleshooting and debugging across:
- PCIe (link training, topology, performance issues)
- Infini Band (fabric debugging, throughput, connectivity issues)
- NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)
- Conduct rigorous system validation and characterization for GPU, CPU, and high‑performance compute platforms.
- Support E2E integration and solution testing to ensure Crusoe Cloud products meet performance, reliability, and scalability expectations.
- Collaborate with mechanical, thermal, firmware, software, and manufacturing teams to resolve system‑level issues and enable stable production operation.
- Drive prototyping, qualification, and readiness for high‑volume manufacturing with both internal teams and external vendors.
- Identify opportunities for new hardware technologies, testing methods, and sustainability improvements aligned with Crusoe’s long‑term objectives.
- Provide data‑driven insights to influence Crusoe’s hardware roadmap and reliability strategy.
- 8–10+ years of experience in hardware development, validation, sustaining engineering, or production engineering.
- Strong hands‑on expertise in PCIe, Infini Band, and NVMe/storage debugging and development.
- Deep proficiency in hardware bring‑up, board‑level debugging, and system‑level validation.
- Ability to design and implement automation frameworks for hardware testing (Python, Shell, or similar).
- Technical background in digital and analog design, server architecture, and high‑performance compute hardware.
- Experience working across thermal, mechanical, firmware, and software functions in multidisciplinary environments.
- Strong analytical and problem‑solving skills with a data‑driven approach.
- Excellent communication and collaboration skills for working with internal teams and external partners.
- Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or equivalent experience.
- Experience designing or optimizing GPU‑to‑GPU communication architectures for AI/ML workloads.
- Direct experience integrating NVLink or other next‑generation GPU interconnect technologies.
- Familiarity with cutting‑edge GPU architectures and how to leverage them in AI/HPC environments.
- Expertise supporting or designing systems across both ARM and x86 server architectures.
- Background in sustainable or energy‑efficient hardware design practices.
- Advanced certifications or coursework in AI/HPC hardware systems.
- Industry competitive pay
- Restricted Stock Units in a fast growing, well‑funded…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).