Linux System/Platform Engineer
Listed on 2025-12-01
-
IT/Tech
Systems Engineer, Cloud Computing
The National Energy Research Scientific Computing Center (NERSC) is seeking a versatile Linux System / Platform Engineer to join our team building and managing Linux-based infrastructure.
More than ever, scientific discovery transforms our world. NERSC is at the forefront, operating some of the world's largest supercomputers for thousands of researchers who use computational power to solve society's most challenging problems.
In this exciting role, you will help build and manage our container and virtual machine platforms and use them to deploy systems that keep our supercomputing center running smoothly and help researchers make the most of its resources, including API endpoints, scientific research tools, authentication, identity and access management, databases, and more. You ll join a group of systems and software engineers and will routinely work with other groups across NERSC on a variety of projects.
You ll also collaborate with our counterparts at peer scientific facilities, also operated by the Department of Energy Office of Science, to streamline cutting-edge research using automation and cloud-native and AI tools and techniques.
If you are interested in science, have Linux experience, and would enjoy working in a fast-paced, creative environment with a bright and diverse group of colleagues and a beautiful view from our building in the Berkeley Hills, we want to hear from you!
What You Will Do, at Level 3:Work with a team to build and manage Linux systems and storage infrastructure.
Troubleshoot and solve complex technical problems with other team members.
Install, upgrade, and secure equipment and services.
Develop and refactor scripts and other code.
Participate in 24x7 on-call rotation.
Coordinate small project teams or other initiatives (such as the rollout of a new service or system, or a major equipment or software upgrade).
Work with vendors to prioritize efforts and enhance their technologies to meet user needs.
Work with researchers to deploy services using Spin, our container cloud platform based on Kubernetes.
Collaborate within NERSC and across the DOE community to develop services, integrate them into the new NERSC supercomputer Doudna, the NERSC data center environment, and across multiple DOE facilities.
Present developments to NERSC staff and the broader HPC community at science conferences and industry meetings.
Analyze and solve complex technical problems requiring in-depth evaluation of variable factors.
Work at a higher level of independence while carrying out work assignments.
Research, select, and lead the implementation of new technologies.
Develop team strategy and project plans.
Provide leadership and technical guidance to group members and other colleagues at NERSC.
Recommend and lead system improvement efforts that enhance system performance, reliability, and security.
Identify and evaluate emerging HPC technologies and features that could introduce novel capabilities or enhance existing system performance and utility.
Represent NERSC in technical or user advocacy groups to influence the HPC and DOE community to meet user needs.
Typically, 8+ years of related experience with a Bachelor s degree; alternatively, 6+ years with a Master s degree; or equivalent career experience.
4+ years of experience managing large-scale Linux-based system deployments in a high-performance computing, cloud computing, or hyper-scale environment.
Experience with some or all of our key technologies:
containers (such as Docker or Kubernetes)
virtualization (such as Proxmox or VMware)
cloud-based deployment (such as AWS, Azure or GCP)
Using and developing AI (or machine learning) tools and services
identity and access management
database administration, tuning, and troubleshooting
networked storage systems
backup technologies
Familiarity with automated provisioning systems (such as Chef, Foreman, or Terraform).
Familiarity with configuration management systems (such as Ansible or Puppet).
Working knowledge of Linux system engineering and security practices.
Ability to resolve complex issues in creative and effective ways and derive technical solutions in a collaborative environment to meet end user requirements or needs.
Demonstrated ability to work independently as well as collaboratively in large projects, and contribute to an active and respectful intellectual environment.
Creative, positive, and collaborative work style.
Excellent oral and written communication skills.
Typically, 12+ years of related experience with a Bachelor s degree; alternatively, 8+ years with a Master s degree; or equivalent career experience.
Experience in software engineering or complex scripting.
Experience managing network equipment.
Ability to lead and coordinate projects.
Ability to analyze and resolve significant and unique issues requiring evaluation of multiple intangible factors.
Ability to exercise independent judgment in methods, techniques and…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).