Senior HPC Systems Engineer: Classified Exascale Linux
Listed on 2026-05-27
-
IT/Tech
Systems Engineer, Cybersecurity, Cloud Computing, IT Support
The Field Intelligence Operations Division invites candidates to apply to join our National Security Computing team to contribute to the design, implementation, and management of High‑Performance Computing (HPC) systems within a classified environment. We are looking for candidates with experience in HPC architecture, cluster management, and parallel computing, with a proven ability to work within highly secure and regulated environments. This role involves close collaboration with security teams, scientists, and IT leadership to ensure that the HPC infrastructure meets the stringent performance, security, and compliance requirements necessary for classified work.
As part of our team, you will be joining a vibrant group of professionals eager to provide premier customer service to ensure people and information technology remain secure. The team is collaborative and strives to ensure security practices and procedures are understood, implemented, and enforced. All team members deliver ORNL’s mission by aligning behaviors, priorities, and interactions with our core values of Impact, Integrity, Teamwork, Safety, and Service.
This evergreen posting represents multiple openings for roles across ORNL’s high-performance computing classified ecosystem.
Major Duties and Responsibilities- Provide technical leadership in the design, integration, and administration of large-scale Linux-based HPC clusters, high‑speed networks, and storage systems.
- Lead medium to large technical projects, coordinating requirements, schedules, and deliverables across internal and external stakeholders.
- Architect and deploy advanced infrastructure solutions supporting exascale‑class and mission‑critical computing environments.
- Serve as a technical mentor for HPC engineers, guiding best practices in automation, performance tuning, and system security.
- Develop, implement, and maintain configuration management and automation frameworks (e.g., Ansible, Puppet, Salt) to enhance reliability and reproducibility.
- Perform advanced system performance analysis, troubleshooting, and optimization, ensuring system scalability and long‑term sustainability.
- Manage critical vendor and partner relationships, representing ORNL’s technical requirements during procurement, integration, and system acceptance.
- Contribute to strategic planning and technology roadmaps, influencing unit goals and technical direction.
- Collaborate closely with scientists, researchers, and IT specialists to align infrastructure capabilities with research and security objectives.
- Ensure compliance with DOE cybersecurity standards, configuration baselines, and operational policies.
- Author technical documentation, present internal briefings, and communicate complex issues and resolutions to management and stakeholders.
- Participate in on‑call rotations, maintenance windows, and incident response as needed to support 24x7 operations.
- BS degree in computer science, engineering, or a related technical field.
- A minimum of 8 years of relevant experience in Linux systems administration or HPC systems engineering.
- Demonstrated experience leading the design and deployment of HPC or large‑scale distributed computing systems.
- Expertise with batch schedulers (SLURM, PBS, LSF) and parallel file systems (Lustre, GPFS/Spectrum Scale).
- Proven ability to lead technical projects from concept through implementation, balancing technical depth with project delivery.
- Strong proficiency in automation and infrastructure‑as‑code frameworks (Ansible, Puppet, Salt).
- Advanced scripting or programming skills (Python, Bash, Go) for automation and operational tooling.
- In‑depth understanding of high‑speed interconnects (Infini Band, Slingshot, Ethernet) and storage architectures.
- Experience managing identity and access management systems, including MFA, SSO, and zero‑trust frameworks (Ping Federate, RSA Secure
ID, Entra ). - Experience integrating virtualization or containerization solutions (VMware, KVM, Apptainer, Podman) into HPC environments.
- Ability to manage client and stakeholder relationships across multiple directorates and technical disciplines.
- Excellent written and verbal…
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).