AI Systems Administration Specialist
Listed on 2026-05-31
-
IT/Tech
Systems Engineer, Cloud Computing
Overview
The Argonne Leadership Computing Facility (ALCF) is home to one of the world's pioneering exascale supercomputers, Aurora. With its extraordinary computing speed and advanced artificial intelligence capabilities, Aurora is set to revolutionize scientific research. ALCF is dedicated to supporting high-performance computing (HPC) and adjacent services that are crucial to the research workflow. The ALCF is seeking a skilled Systems Administration Specialist to join their team to support the AI Testbed, which evaluates emerging hardware and software platforms for artificial intelligence and machine learning for science.
Responsibilities- Work directly with first‑class systems alongside scientific staff and research colleagues within the division.
- Serve as a systems administrator on Argonne’s AI Testbed, installing and managing diverse AI and machine‑learning hardware and software.
- Work directly with other subject‑matter experts to ensure the sustainability and availability of the testbed infrastructure.
- Support machines in a mixed operating‑system environment and work efficiently with other operations groups.
- Provide guidance so researchers can rely on the environment, keeping research productive.
- Work in a hybrid environment with 2+ days onsite in Lemont, Illinois; fully onsite if preferred.
Skills and Qualifications
- Experience in UNIX systems administration, especially Linux, with an emphasis on OS installation and upgrading, package building and management, common services and applications, and troubleshooting.
- Experience with Salt, Ansible or similar configuration‑management tools.
- Experience with Git or other modern version control platforms.
- Effective problem‑solving skills.
- Working knowledge of scripting languages, particularly Python.
- Ability to write concise documentation.
- Ability to work effectively as a member of a team.
- Flexibility in handling assignments and working on several projects simultaneously.
- Knowledge and understanding of safe operation within a datacenter, including mounting and unmounting server hardware.
- Ability to handle physical labor of installing racks and servers in a datacenter, lifting up to 20 pounds independently.
- Understanding of IPv4 networking.
- Ability to model Argonne’s core values: impact, safety, respect, integrity and teamwork.
- Proof of U.S. citizenship required.
Skills and Qualifications
- Knowledge of AI/ML systems architecture and workflows (Groq, Cerebras, Graphcore, Samba Nova).
- Working knowledge of Kubernetes management.
- Knowledge of scientific applications.
- Knowledge of high‑performance networking technologies such as Infiniband and Slingshot.
- Knowledge of storage‑area networking and storage arrays, such as Net App.
- Knowledge of parallel and distributed file systems such as Lustre and related hardware.
- Knowledge of high‑performance computing techniques, graphics, and visualization.
- Experience with software packaging, building from source, and dynamic linking.
- Understanding of MPI and its implementations.
- Ability to gather site requirements and represent them to design and development teams to find appropriate solutions across multiple sites.
- Ability to independently assess requirements, identify tasks, and coordinate with peers to accomplish goals.
- Experience implementing CI or CD workflows.
This position can be hired at one of two levels; the selected candidate will be placed at the appropriate level (PT3 or PT4) depending on depth and breadth of knowledge and skills.
PT3 – Bachelor’s degree and 4+ years of experience, or a Master’s degree and 2+ years of experience, or equivalent. Pay range: $86,299 – $134,626.
PT4 – Bachelor’s degree and 6+ years of experience, or a Master’s degree and 4+ years of experience, or equivalent. Pay range: $106,455 – $166,070.
BenefitsExtensive benefits are part of the total rewards package.
EEO & LegalAs an equal employment opportunity employer, and in accordance with our core values of impact, safety, respect, integrity and teamwork, Argonne National Laboratory is committed to a safe and welcoming workplace that fosters collaborative scientific discovery and innovation. Argonne encourages…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).