Senior Storage Benchmarking Engineer Job Santa Clara area,California USA,Engineering

We’re in an unbelievably exciting area of tech and are fundamentally reshaping the data storage industry. Here, you lead with innovative thinking, grow along with us, and join the smartest team in the industry.

This type of work—work that changes the world—is what the tech industry was founded on. So, if you're ready to seize the endless opportunities and leave your mark, come join us.

THE ROLE

The Storage Benchmarking Engineer will design, execute, and analyze performance benchmarks spanning both industry-standard storage benchmarks (fio, vdbench, SPEC SFS 2020, IO500, SPC-1/SPC-2) and the emerging class of AI/ML storage workloads (MLPerf Storage, DLIO, and GPU-driven training/inference data pipelines). As AI has made storage a first-order bottleneck in the GPU data path, this role sits at the intersection of high-performance storage and large‑scale AI infrastructure.

This position demands strong end‑to‑end performance troubleshooting across the entire stack — compute (including GPUs), network (including RDMA/Infini Band and high‑speed Ethernet), and storage — together with close collaboration across engineering, product management, marketing, and sales. The ideal candidate has hands‑on experience with both classic storage benchmarks and AI data‑pipeline benchmarking, a track record engaging benchmark standards organizations and communities, and exceptional communication and writing skills.

WHAT

YOU'LL DO

Configure and scale the HPC/AI lab environment so all systems — including GPU servers, high‑speed fabrics, and storage — achieve maximum efficiency and scale across a variety of test harnesses. Build robust automation so labs can be rapidly configured and reconfigured to meet the demands of different benchmarks.
Design and execute storage performance benchmarks using industry‑standard tools and methodologies, including fio, vdbench, SPEC SFS 2020, IO500, and SPC-1/SPC-2 (or similar).
Design and execute AI/ML storage benchmarks, including MLPerf Storage, DLIO, and representative AI workloads — model training and checkpointing, inference and data ingest, RAG/vector‑database access patterns, and GPU‑driven I/O paths (e.g., GPUDirect Storage, NFS/RDMA). Characterize storage behavior against reference architectures such as NVIDIA DGX/SuperPOD and BasePOD.
Perform end‑to‑end performance troubleshooting and debugging across compute, GPU, network, and storage components to pinpoint and resolve bottlenecks and achieve best‑in‑class results.
Develop and maintain automated benchmarking workflows using tools like Ansible, Python, or Bash to ensure rapid provisioning and efficient, repeatable, reproducible results.
Analyze benchmark results, generate detailed reports, and deliver actionable insights to engineering teams for product optimization.
Collaborate with engineering, product management, marketing, and sales to align benchmarking efforts with product goals and customer needs.
Engage directly with benchmark standards organizations (e.g., SPEC, SNIA, MLCommons) and communities to influence methodologies, drive submissions, and stay ahead of industry and AI infrastructure trends.
Deliver high‑impact presentations to internal teams, customers, and external stakeholders, translating complex technical data into clear narratives.
Write technical marketing documents, whitepapers, and performance summaries to support product launches and customer engagement.
Maintain comprehensive documentation of benchmarking processes, configurations, and results.
We are primarily an in‑office environment, and you will be expected to work from the Santa Clara office in compliance with Everpure's policies, unless you are on PTO, work travel, or other approved leave.

WHAT YOU BRING

Bachelor's or Master's degree in Computer Science, Electrical Engineering, or a related field (or equivalent experience).
5+ years of experience in storage performance benchmarking or a related technical role.
Proven expertise with storage benchmarking tools, including:
Experience benchmarking AI/ML storage workloads, such as MLPerf Storage, DLIO, or characterizing storage for GPU‑based training and inference pipelines (data ingest, checkpointing, GPUDirect…