×
Register Here to Apply for Jobs or Post Jobs. X

Cloud Hardware Development Engineer, Cloud AI​/ML​/storage server teams

Job in Cupertino, Santa Clara County, California, 95014, USA
Listing for: Amazon
Full Time position
Listed on 2026-06-06
Job specializations:
  • Engineering
    Systems Engineer
Salary/Wage Range or Industry Benchmark: 80000 - 100000 USD Yearly USD 80000.00 100000.00 YEAR
Job Description & How to Apply Below

Cloud Hardware Development Engineer, Cloud AI/ML/storage server teams

Job :  | Amazon Data Services, Inc.

As a Cloud Hardware Development Engineer, you will be an end-to-end owner of storage and/or accelerator (AI/ML/GPU) server platforms — from New Product Introduction (NPI) through fleet health in production.

You will work closely with internal customers to understand technical needs and business goals, leveraging your experience to architect solutions at scale.

In this role you will collaborate with component, firmware, power, mechanical, electrical, test, qualification, and manufacturing engineers, and lead our ODM partners to bring servers to the data center. After launch, you will monitor quality, drive reliability improvements, and ensure ongoing operational excellence.

Key Responsibilities
  • NPI – New Product Introduction
    • Own the end‑to‑end NPI lifecycle for storage and/or accelerator server platforms—from architecture definition through design, qualification, manufacturing ramp, and launch.
    • Lead technical solutions for complex server and rack system architectural challenges.
    • Work with ODM / manufacturing partners to develop, validate, and manufacture server products at scale.
    • Develop functional specifications, design verification plans, and test procedures.
    • Drive qualification and readiness milestones, ensuring new platforms meet performance, reliability, and cost targets before fleet deployment.
    • Identify and resolve technical risks early in the development cycle—to prevent problems from reaching production.
  • Fleet Health, Diagnostics & Automation
    • Own fleet health for the launched server platforms—responsibility extends beyond ship.
    • Design and implement predictive failure detection systems using telemetry, sensor data, error trending, and log correlation.
    • Drive toward zero‑touch operations—build detection, diagnoses, and remediation of faults without human intervention.
    • Debug complex system failures in time‑sensitive settings and perform root‑cause analysis across firmware, kernel, driver, thermal, power, and physical layers.
  • Systems Design & Technical Depth
    • Apply expertise across hardware, software, system design, x86 architecture, and operations.
    • Design and implement solutions to address system‑level issues at large scale.
    • Collaborate with hardware, software, manufacturing, supply chain, and product teams.
  • Cross‑Team Collaboration
    • Work closely with internal customers to ensure new hardware meets data path and control path requirements.
    • Identify potential problems early when onboarding servers into customer ecosystems.
    • Partner with datacenter operations to close the loop between field failures and design improvements.
  • A Day in the Life

    Your day-to-day work includes interfacing with internal and external customers, reviewing platform designs with ODMs, deepening analysis of logs, and chasing fleet failures. Your role requires a range of responsibilities that continually challenge you.

Basic Qualifications
  • Experience in developing functional specifications, design verification plans, and functional test procedures.
  • Bachelor's degree or higher in electrical, computer engineering, or equivalent.
  • Proficient English‑language communication skills, both written and verbal.
  • Experience in design, innovation, and research & development.
  • Knowledge of operating systems, hardware, storage, network, security, database administration, and cloud infrastructure.
  • Experience with server technologies such as thermal, mechanical, power, and signal integrity.
  • 5+ years of professional work (non‑internship) experience.
Preferred Qualifications
  • 5+ years of hardware design and validation of components, subsystems, and systems.
  • Experience with server technologies: board design, high‑speed bus design, signal integrity, failure analysis, CPU, GPU, SSD, memory, BIOS, BMC, and networking.
  • Experience developing and executing test procedures for mechanical or electrical systems.
  • Experience working with ODMs throughout product development and manufacturing lifecycle.
  • Experience building predictive failure detection or proactive remediation systems at fleet scale.
  • Experience with storage/compute/GPU/accelerator platforms—including integration, diagnostics,…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary