Linux/HPC Systems Engineer
Listed on 2026-02-09
-
IT/Tech
Systems Engineer
In this role, your daily impact spans the entire spectrum of systems engineering. One hour, you might be performing routine lifecycle maintenance—patching a fleet of RHEL workstations or managing user identities across a heterogeneous domain—to ensure the baseline stability of our enterprise. The next, you are diving into the high-performance fabric, debugging a latency spike on an Infini Band card or fine-tuning a Slurm scheduler to prioritize a mission-critical simulation.
You aren't just managing boxes; you are the bridge between raw silicon and national security breakthroughs. Whether it's the methodical "hardening" of a standard server build to meet SAP requirements or the high-adrenaline optimization of a multi-petabyte Lustre file system, your work ensures that our researchers never have to wait on the infrastructure to catch up with their imagination. This position is 100% on-site.
Responsibilities- Architect & Deploy:
Lead the design and lifecycle management of mission-critical Linux workstations, enterprise-grade servers, and high-performance computing (HPC) clusters. - Engineer File systems:
Master the art of data movement. Administer complex local and distributed file systems (Lustre, GPFS/Spectrum Scale) to ensure extreme-speed access across the fabric. - Infrastructure as Code (IaC):
Treat the data center as a codebase. Develop sophisticated automation workflows using Python, Bash, and Ansible to eliminate manual toil and ensure drift-free configurations. - Defensive Engineering:
Implement "Hardened by Design" security. Fine-tune SELinux policies and advanced firewall configurations to protect sensitive data without sacrificing computational performance. - Container Orchestration:
Modernize scientific workflows by deploying and managing isolated environments using Podman while working to establish a Kubernetes environment. - HPC Performance Tuning:
Push the limits of the silicon. Optimize cluster scheduling and management utilizing industry-leading tools like Bright Cluster Manager and Slurm. - Low-Latency Networking:
Configure and optimize high-bandwidth networking, including Infini Band fabrics, for seamless inter-node communication. - Technical Documentation:
Author high-fidelity playbooks and strategic architectural diagrams that serve as the blueprint for our evolving infrastructure.
- Bachelor’s Degree in related field or equivalent high-level professional experience in mission-critical environments
- Minimum of 1 to 10 years of related experience
- U.S. Citizenship required:
Active DoD Top Secret security clearance with eligibility for SCI along with successful completion of CI Scope Polygraph within 180 days of hire - Ability and willingness to obtain and maintain Special Access Program (SAP) eligibility
- Active DoD 8570.01-M baseline certification (Security+ CE, SSCP, or equivalent)
- Deep-tier professional experience in Linux systems engineering (RHEL/Rocky preferred)
- Active TS/SCI clearance with a current CI Polygraph
- Advanced Certification: RHCE, RHCSA, or similar
- Direct experience tuning kernel parameters and MPI libraries for large-scale distributed computing
- Expertise in VMware, Nutanix, or KVM within a heterogeneous environment that include Windows integration
Applicant selected will be subject to a government security investigation and must meet eligibility requirements for access to classified information. COLSA Corporation is an Equal Opportunity Employer, Minorities/Females/Veterans/Disabled. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, or national origin.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).