HPC Linux Storage Engineer Job Oak Ridge area,Tennessee USA,IT/Tech

Select how often (in days) to receive an alert:

Oak Ridge National Laboratory (ORNL), home to some of the world’s most powerful supercomputers, is seeking highly skilled professionals to support large-scale storage systems, high-speed parallel file systems, and archival solutions critical to advancing scientific discovery and innovation. As part of ORNL’s leadership-class computing ecosystem, you will play a vital role in designing, deploying, optimizing, and maintaining infrastructure that powers cutting-edge research across diverse scientific domains.

This evergreen posting represents multiple opportunities across ORNL’s high-performance computing (HPC) environment, supporting scalable, reliable, and secure computing and storage capabilities. Applications are reviewed on an ongoing basis as new positions become available to meet the dynamic needs of our world-class computing facility.

Job Duties and Responsibilities May Include:

Design and Management of Infrastructure: Architect, deploy, and manage large-scale storage systems and HPC platforms to support research, scientific, and enterprise workloads. Develop and implement solutions for structured, unstructured, and archival data storage, focusing on scalability, reliability, and performance.
Systems Analysis and Development: Apply systems analysis techniques to consult with users/customers, determine functional requirements, and design, test, or optimize storage and computational solutions tailored to their needs. Develop, document, and modify solutions, including system prototypes and automated workflows, to enhance operational efficiency.
Performance, Optimization, and Troubleshooting: Ensure the performance, availability, scalability, and security of diverse infrastructure environments. Diagnose and resolve complex operational challenges quickly and effectively, applying advanced performance optimization techniques for a wide range of workloads.
Collaboration and Best Practices: Work closely with stakeholders from research, technical, and operational teams to understand workflows, identify opportunities for improvement, and deliver effective solutions. Define, implement, and enforce best practices, standards, and procedures across projects and teams.
Automation and Innovation: Automate system configuration, provisioning, monitoring, and maintenance to reduce manual efforts and downtime. Evaluate emerging technologies and tools to continuously improve system capabilities, adapt to changing needs, and plan for future advancements.
Support and Maintenance: Support critical infrastructure through participation in a 24/7 on-call rotation and off-hours maintenance windows. Resolve hardware and software issues in coordination with vendors, ensuring minimal impact on operations.

Basic Qualifications

Bachelor’s degree in computer science, engineering, information technology, or a related field; and at least 5 years of professional experience managing Linux/UNIX systems in heterogeneous environments. An equivalent combination of education and experience will be considered.
Demonstrated experience with high-performance computing (HPC) storage systems and enterprise storage platforms (e.g., Lustre, GPFS, BeeGFS, or WEKA).
Proficiency in scripting languages (e.g., Python, Bash, Perl) and configuration management/automation tools (e.g., Ansible, Puppet, Git).
Strong communication, collaboration, and problem-solving skills with the ability to design and implement solutions independently.

Preferred Qualifications

Active DOE Q, DoD Top Secret, or TS/SCI clearance.
Hands-on experience with HPC cluster technologies, including job schedulers (e.g., SLURM) and system deployment tools (e.g., Warewulf, PXEboot, Bright Cluster Manager).
Expertise in high-performance parallel file systems, tape library systems, and storage networking technologies (e.g., RAID, ZFS, NVMe-oF, Infiniband).
Familiarity with performance monitoring tools (e.g., Grafana, Nagios), benchmarking systems, and I/O optimization techniques.
Experience with virtualization and containerization platforms (e.g., VMware, KVM, Podman, Apptainer).
Background in open source development, including submitting patches…


Increase/decrease your Search Radius (miles)



Job Posting Language