×
Register Here to Apply for Jobs or Post Jobs. X

Senior HPC Cluster Systems Administrator

Job in Berkeley, Alameda County, California, 94709, USA
Listing for: Lawrence Berkeley National Laboratory
Full Time position
Listed on 2025-12-05
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Job Description & How to Apply Below

Berkeley Lab’s ( LBNL ) Information Technology Division ( IT ) has an opening for a Senior HPC Cluster Systems Administrator to join their Science

IT Team !

In this exciting role, you will support the Berkeley Lab research community by building, integrating, and maintaining Linux-based resources, high-performance computing cluster systems, and Kubernetes clusters. This role provides extensive expertise in High Performance Computing infrastructure and delivers advanced Linux solutions to further scientific endeavors at Berkeley Lab. The mission of Scientific Computing under Science

IT is to facilitate groundbreaking fundamental research globally by providing essential computing tools, networks, and expertise to enable pioneering science.

This position has an anticipated start date of January 5, 2026.

We’re here for the same mission, to bring science solutions to the world. Join our team and YOU will play a supporting role in our goal to address global challenges! Have a high level of impact and work for an organization associated with 17 Nobel Prizes!

We invest in our employees by offering a total rewards package you can count on:

  • Exceptional health and retirement benefits , including pension or 401K-style plans
  • A culture where you’ll belong - we are invested in our teams!
  • In addition to accruing vacation and sick time, we also have an annual Winter Holiday Shutdown
  • Parental bonding leave (for both mothers and fathers)
What You Will Do
  • Perform Linux system and HPC cluster maintenance and installations, operating system upgrades, system security hardening and intrusion detection, storage and file system management, system hardware, customization of user group working environment, troubleshooting, network monitoring, and crash recovery.
  • Design, deploy, and manage scalable applications using Kubernetes, ensuring the availability, performance, and readiness of the Kubernetes infrastructure.
  • Automate deployment, scaling, and management of containerized applications, and collaborating with Dev Ops and development teams to streamline CI/CD pipelines.
  • Design, deploy, and manage the global storage platform to ensure high performance, massive scalability, reliability, and future-proof solutions.
  • Support storage technologies such as Lustre, VAST, and networks.
  • Resolve I/O issues related to business applications, including diagnosing and resolving complex storage, Linux, and networking challenges in a fast-paced environment.
  • Research new storage management technologies, techniques, and provide recommendations.
  • Participate in developing system administration, security, and network policies, documentation, and tools oriented towards efficient systems management.
  • Participate in cluster support to staff and researchers, including initial installation, integration, and ongoing maintenance of Linux High-Performance Computing cluster systems. This includes travel to remote sites if as needed.
  • Co-leading technical efforts with other senior system administrators in areas of HPC technologies such as job schedulers, high-performance interconnects, parallel file systems, cybersecurity, cluster management, container orchestration, VM infrastructure, networking, performance tuning, or data center planning.
  • Co-leading group projects of small to medium size and complexity, to implement and deploy new computing technologies and associated services to the research community.
What We Are Looking For
  • A Bachelor’s Degree (or equivalent knowledge/training) in Computer Science, Engineering, or a related discipline, and a minimum of 12 years of relevant experience in Linux system administration within a large distributed computing environment, including experience providing systems and end-user support for multiple scientific or computational research groups or an equivalent combination of education and experience.
  • Demonstrated ability to manage large-scale, performance-critical environments, including capacity planning, scaling, and optimization.
  • Significant experience deploying, scaling, and managing Kubernetes clusters, with a strong understanding of its architecture (pods, deployments, services, ingress) and container orchestration. Proven…
Position Requirements
10+ Years work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary