×
Register Here to Apply for Jobs or Post Jobs. X

System Infrastructure​/Platform Engineer, HPC Technology Department

Job in Berkeley, Alameda County, California, 94709, USA
Listing for: Berkeley Lab
Full Time position
Listed on 2026-06-12
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing, IT Support, Cybersecurity
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below
Position: System Infrastructure / Platform Engineer, HPC Technology Department

Overview

The National Energy Research Scientific Computing Center (NERSC) is seeking a System Infrastructure / Platform Engineer to help build and manage HPC systems and Linux-based infrastructure. NERSC operates some of the world’s largest supercomputers, supporting thousands of researchers tackling major scientific challenges.

In this role, you will manage high-performance computing environments, including HPC systems, containers, virtual machines, and core infrastructure services. You’ll work with cutting-edge technologies such as CPU/GPU clusters, parallel storage, high-speed networking, Slurm, and Kubernetes, balancing innovation with reliability, performance, and security  will collaborate with engineers, researchers, vendors, and open-source communities to develop scalable solutions that advance scientific discovery and the future of HPC.

What

You Will Do
  • Build and manage Linux systems and storage infrastructure
  • Troubleshoot complex technical issues with team members
  • Install, upgrade, and secure systems and services
  • Develop and maintain scripts and automation tools
  • Participate in a 24/7 on-call rotation
  • Lead small projects, upgrades, and service rollouts
  • Collaborate with vendors to improve technologies and user experience
  • Support reliable operations of NERSC’s Perlmutter supercomputer and Spin Kubernetes platform
  • Develop and integrate services across NERSC and DOE facilities, including the upcoming Doudna supercomputer
  • Present technical work to the HPC community at conferences and industry events
Responsibilities
  • In addition to Level 3 responsibilities, Level 4 adds:
    Solve complex technical problems with independent judgment; develop team strategies and project plans; provide technical leadership and mentorship; lead system improvements for performance, reliability, and security; evaluate emerging HPC technologies; represent NERSC in HPC and DOE technical communities and advocacy groups.
What is Required to be hired at a Level 3
  • Typically, 8+ years of related experience with a Bachelor’s degree; alternatively, 6+ years with a Master’s degree; or equivalent career experience
  • 4+ years of experience managing large-scale Linux-based system deployments in a high-performance computing, cloud computing, or hyper-scale environment
  • Mastery of Linux concepts and operations (processes, networking, system logs, performance)
  • Proficiency with bash and Python scripting
  • Experience with some or all of our key technologies:
    • containers (such as Docker or Kubernetes)
    • virtualization (such as Proxmox or VMware)
    • cloud-based deployment (such as AWS, Azure or GCP)
    • identity and access management
    • database administration, tuning, and troubleshooting
    • storage systems technologies (such as iSCSI and NAS appliances)
    • parallel file systems (such as Lustre, GPFS, or VAST)
    • high-speed networking/interconnect (such as Infini Band, Slingshot, or RoCE)
    • advanced performance analysis and debugging tools (such as strace, lsof, ebpf, or gdb)
    • Dev Ops tools (such as Gitlab or Jira) and processes (such as issues, merge requests, and API/automation)
  • Familiarity with automated provisioning systems (such as Chef, Foreman, or Terraform)
  • Familiarity with configuration management systems (such as Ansible or Puppet)
  • Working knowledge of Linux system engineering and security practices
  • Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end user requirements or needs
  • Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment
  • Creative, positive, and collaborative work style
  • Excellent oral and written communication skills
Requirements
  • Additional Requirements to be hired at a Level 4:
    • Typically, 12+ years of related experience with a Bachelor’s degree; alternatively, 8+ years with a Master’s degree; or equivalent career experience
    • Proven ability to lead troubleshooting and resolution of high-impact incidents in complex, large-scale environments
    • Demonstrated leadership in cross-team collaboration and mentoring
    • Experience in software engineering, Linux systems programming, or complex scripting
    • Experience…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary