Senior HPC Linux Systems Engineer
Listed on 2026-02-16
-
IT/Tech
Systems Engineer, IT Support
Location: Knoxville, TN
Job :496
# of Openings:1
Founded in 1999 in the beautiful Smoky Mountains of East Tennessee, Cadre5 provides innovative technical solutions to our customers locally and nationally. Our Cadre5 Lab Partners division has partnered with the National Center for Computational Sciences (NCCS) at Oak Ridge National Laboratory (ORNL) to recruit a Senior HPC Linux Systems Engineer to play a key role in improving the security, performance, and reliability of the NCCS computing environments.
This includes supporting one of the fastest supercomputers in the world, Frontier, along with numerous commodity clusters and specialized programs and partnerships. Frontier is one of the scientific research community's most powerful computational instruments for exploring solutions to some of today's most challenging problems.
As an HPC Linux Systems Engineer, you will work within the HPC Scalable Systems Group inside of the NCCS Systems Section to support numerous activities of the center.
The Scalable Systems group oversees, administers and supports system installation, deployment, acceptance, performance testing, upgrades, problem diagnosis, and troubleshooting of large-scale HPC computational resources.
The Systems Section is within the National Center for Computational Sciences Division (NCCS). The HPC Systems Section is responsible for the division's computing, storage, networking, and infrastructure systems and services.
The NCCS provides state-of-the-art computational and data science infrastructure, coupled with dedicated technical and scientific professionals, to accelerate scientific discovery and engineering advances across a broad range of disciplines. NCCS hosts the Oak Ridge Leadership Computing Facility, one of DOE's National User Facilities.
ORNL delivers scientific discoveries and technical breakthroughs needed to realize solutions in energy and national security and provides economic benefit to the nation. This premier research institution located near Knoxville in Oak Ridge, TN, addresses national needs through impactful research and world-leading research centers.
This is a full-time, permanent position that follows a hybrid/onsite working arrangement.
Why Cadre5?- Working with highly talented team members
- Excellent medical insurance, including employer-paid benefits
- Install, integrate, and administer HPC Linux clusters and high-speed networks
- Diagnosing system operational problems quickly and effectively
- Coordinating with vendors to resolve hardware and software problems
- Recommending, planning, and coordinating hardware and software changes with customer participation using
- Porting and writing system management tools
- Documenting system administration procedures for routine and complex tasks
- Participating in a 24-hour, 7-day on-call support rotation and off-hours maintenance windows
- System implementation/integration into the NCCS environment and systems performance analysis.
- Lead system deployment, integration and troubleshooting of a large-scale computer system.
- Participate in relevant systems topics with the internal and external community of peers contributing experiences and solutions.
- Mentor junior-level staff as they join the group.
- Bachelor's Degree in a scientific or technical field with a combination of 8+ years of Linux systems experience is required. An equivalent combination of education and experience will be considered
- The ability to obtain and maintain a Department of Energy "Q" clearance is required. This requires US Citizenship.
- Experience managing Linux operating systems in a large-scale system environment
- Solid understanding of networked computing environment concepts
- Experience with Linux Cluster Administration
- Ability to develop and maintain programs and scripts that aid in the operation and automation of administrative tasks using various shell and scripting languages (bash, Python, Go)
- Experience with Lustre and GPFS file systems
- Experience with batch schedulers (particularly SLURM)
- Experience deploying and maintaining automated configuration management software such as Puppet
- Strong interpersonal and communication skills
- Ability to work as a team player
- Proactive and solution-oriented problem solver
- Prior project and/or team leadership experience
Cadre5 offers excellent pay and benefits, to include full medical, dental, and vision coverage coupled with 401K match, 15 days PTO, and 10 holidays.
Cadre5 is an equal opportunity employer. All qualified applicants, including individuals with disabilities and protected veterans, are encouraged to apply. Cadre5 is an E-Verify Employer.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).