HPC Storage Systems Group Leader
Listed on 2025-12-07
-
IT/Tech
Data Engineer, Data Analyst
The National Energy Research Scientific Computing Center (NERSC) is inviting applications for the position of Storage Systems Group (SSG) Lead. NERSC's mission is to accelerate scientific discovery through high performance computing and data analysis for the Department of Energy's (DOE) Office of Science programs. NERSC is searching for a knowledgeable and inspired group leader for the Storage Systems Group who will be responsible for developing NERSC's storage strategy based on NERSC's systems roadmap, science workflows and user needs.
They will provide vision and guidance to design, operate and simplify the storage environment for NERSC's 11,000+ users.
The SSG is responsible for NERSC's storage portfolio, including large scale high capacity parallel file systems and archival storage systems with an eye towards balancing performance, stability, and usability for NERSC's users who operate in a wide variety of DOE mission areas and scientific domains. The SSG Lead provides technical leadership to a group of highly skilled storage engineers who collaborate with other teams at NERSC to deliver innovative solutions to complex problems and a technical vision for the future of NERSC storage platforms.
The NERSC storage environment that SSG is responsible for today is composed of multiple tiers:
- The NERSC hierarchical storage management system (presently High Performance Storage System (HPSS)) stores more than 450 PB of data for the scientific community and puts NERSC in the top 10 largest HPSS deployments globally.
- NERSC provides a large-scale parallel community file system (presently Storage Scale) with more than 150 PB of online storage to the user community on a RDMA over Converged Ethernet (RoCE) fabric.
- Home and common storage mounted via Storage Scale on several thousand nodes across NERSC.
In addition to the current environment, SSG will be responsible for the scratch and new quality of service storage systems in NERSC's latest GPU based supercomputer, named Doudna to be operationalized in 2027. Doudna will deliver a tenfold increase in computing power to NERSC users along with new capabilities. The new Doudna environment will support larger and higher resolution data sets coming from new sensors, detectors, sequencers and telescopes from the scientific community and these data sets will need to be managed, shared and stored.
The Storage Systems Group lead is responsible for understanding existing and new emerging requirements, and deploying storage solutions in collaboration with other NERSC teams to support NERSC's broad user base of today and tomorrow. In doing so, the SSG Lead will drive the development and implementation of a holistic storage strategy to support changing scientific workflows and new technologies as part of Doudna and future NERSC system roadmaps.
To accomplish this, the SSG Lead will be responsible for investigating new storage technologies and engaging with the vendor community on future roadmaps. The SSG Lead will work with the Data Center Department Head to provide guidance and priorities for the group based on NERSC's strategic plan and its goals.
- Develop NERSC's storage strategy based on NERSC's systems roadmap, science workflows and user needs.
- Lead a team that procures, installs, manages, supports and monitors NERSC's large scale storage systems, including providing 24x7 support.
- Ensure NERSC's storage systems meet the needs of NERSC's 11,000 users by providing high performing, available, and usable systems.
- Work independently and as part of the Storage Systems Group to diagnose and fix storage problems, help analyze storage system issues, and develop and implement workarounds and/or patches for software bugs.
- Provide effective line management to a group of approximately 10 Computer Systems Engineers by hiring excellent staff and working closely with SSG staff members. Ensure staff are meeting goals, provide both positive and constructive feedback to staff and ensure all staff have career growth opportunities.
- Provide technical leadership for implementation and deployment efforts for storage system improvements that enhance task automation, reliability, stability, usability, performance, and security.
- Continuously evaluate new storage technologies and make recommendations on future storage strategy and directions for the center, including both parallel and hierarchical storage, that would create new capabilities and enhance storage and HPC system performance and usability.
- Work closely with other teams at NERSC to enable large-scale simulation, data analysis and AI applications to run on NERSC supercomputing and storage systems.
- Provide budgetary input and oversight for NERSC's storage systems.
- Lead or collaborate efforts with other Department of Energy (DOE) Labs on future storage technologies, multi-lab storage efforts and other related topics.
- Present at conferences and talks to promote NERSC to other national labs and HPC sites.
- Create and develop a vision and strategy for the…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).