Hardware Systems Engineer
Listed on 2026-02-21
-
Engineering
Systems Engineer, Hardware Engineer
Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability.
Be a part of the AI revolution with sustainable technology e, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure.
About This RoleWe are seeking a Hardware Production / Sustaining Engineer to strengthen Crusoe’s Hardware Systems Engineering team and close critical skill gaps in debugging, validation, and production support of high-performance compute systems. In this role, you will take ownership of the full hardware lifecycle—from prototype bring‑up to large-scale production—while driving automation, deep issue resolution, and reliability across Crusoe Cloud’s GPU- and CPU-based infrastructure.
You will work closely with cross-functional teams to support, debug, and improve hardware platforms at scale, with a particular focus on PCIe, Infini Band, and NVMe/storage, which have been identified as essential areas for deeper expertise. Your work will directly impact Crusoe’s ability to deploy and operate sustainable, AI-first compute systems with world-class performance and reliability.
What You’ll Be Working On- Drive the full hardware development and sustaining lifecycle, including feasibility, bring‑up, validation, deployment, and ongoing production support.
- Develop and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability improvements.
- Lead deep troubleshooting and debugging across:
- PCIe (link training, topology, performance issues)
- Infini Band (fabric debugging, throughput, connectivity issues)
- NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)
- Conduct rigorous system validation and characterization for GPU, CPU, and high-performance compute platforms.
- Support E2E integration and solution testing to ensure Crusoe Cloud products meet performance, reliability, and scalability expectations.
- Collaborate with mechanical, thermal, firmware, software, and manufacturing teams to resolve system‑level issues and enable stable production operation.
- Drive prototyping, qualification, and readiness for high-volume manufacturing with both internal teams and external vendors.
- Identify opportunities for new hardware technologies, testing methods, and sustainability improvements aligned with Crusoe’s long‑term objectives.
- Provide data-driven insights to influence Crusoe’s hardware roadmap and reliability strategy.
- 8–10+ years of experience in hardware development, validation, sustaining engineering, or production engineering.
- Strong hands‑on expertise in PCIe, Infini Band, and NVMe/storage debugging and development.
- Deep proficiency in hardware bring‑up, board-level debugging, and system-level validation.
- Ability to design and implement automation frameworks for hardware testing (Python, Shell, or similar).
- Technical background in digital and analog design, server architecture, and high-performance compute hardware.
- Experience working across thermal, mechanical, firmware, and software functions in multidisciplinary environments.
- Strong analytical and problem-solving skills with a data-driven approach.
- Excellent communication and collaboration skills for working with internal teams and external partners.
- Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or equivalent experience.
- Experience designing or optimizing GPU-to-GPU communication architectures for AI/ML workloads.
- Direct experience integrating NVLink or other next-generation GPU interconnect technologies.
- Familiarity with cutting-edge GPU architectures and how to leverage them in AI/HPC environments.
- Expertise supporting or designing systems across both ARM and x86 server architectures.
- Background in sustainable or energy-efficient hardware design practices.
- Advanced certifications or coursework in AI/HPC hardware systems.
- Industry competitive pay
- Restricted Stock Units in a fast growing, well-funded…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).