HPC Systems Engineer Job Charlottesville area,Virginia USA,IT/Tech

Overview
HPC Systems Engineer
Location:
Charlottesville, VA

Clearance Required: Active TS (SCI eligibility)

At Bcore, our strength comes from how we deliver impact to the mission. Whether it’s architecting critical IT solutions, producing actionable intelligence, or developing cutting edge technology, we succeed because of the expertise, collaboration, and agility of our teams. Our Mission Services division combines enterprise IT, cloud solutions, Dev Sec Ops , systems engineering, software development, and operational support. Bcore accelerates decisive advantage for warfighters and intelligence professionals by fusing human insight, rapid-fire engineering, precision-measured outcomes, and relentless grit into mission-ready solutions.

Do you want to join a team that is building tailored technical solutions to modernize our government’s mission and our client’s business? Do you have a desire to change how people work? Are you interested in helping to protect our nation’s cyber interests? Join our growing team supporting the Army customer missions as an HPC Systems Engineer.

Responsibilities
What you get to do every day:

Build, configure, and maintain secure HPC clusters for simulations, scientific computing, and GPU workloads
Collaborate with infrastructure teams on cluster platforms, including schedulers, provisioning systems, high-speed interconnects, and distributed nodes
Configure and manage job schedulers (Slurm, PBS) with queue setup, resource policies, and job optimization
Support containerized workloads (Docker, Podman, Singularity/Apptainer)
Assist with cluster provisioning, node management, and initial build-out, including scheduler configuration and validation
Troubleshoot hardware, OS, scheduler, networking, and high-performance interconnect issues (e.g., Infini Band)
Integrate compute nodes and hardware into clusters
Develop automation and operational tools using Bash, Python, or similar scripting
Support authentication and access control via LDAP or Kerberos
Analyze performance and identify bottlenecks across compute, storage, and network layers for distributed workloads (MPI/OpenMP)
Support GPU-enabled environments and CUDA-based workloads
Coordinate with engineering teams to improve cluster performance, stability, and scalability
Maintain documentation for configurations, procedures, and troubleshooting
Provide technical guidance on HPC best practices for mission workloads

Qualifications

Clearance Required: Active TS clearance (with SCI Eligibility) and eligibility to obtain CI Poly We are not able to upgrade or sponsor clearances

Certification

Required:

Ability to obtain DoD 8140 (8570) IAT Level II certification

Education/

Experience:

Requires Bachelor's degree in Engineering, Computer Science, or related STEM field (experience in lieu of degree)
6+ years of experience administering Linux based systems in enterprise, research computing, or distributed compute environments, including configuration and troubleshooting of multi-node systems.

Required Skills:

Experience supporting distributed compute environments with workload schedulers (e.g., Slurm, PBS, Torque, Grid Engine)
Experience supporting multi-node compute environments or HPC clusters
Professional experience administering Linux systems via CLI (RHEL derivatives preferred)
Experience with scripting and automation (Bash, Python, or similar)
Experience troubleshooting server hardware, OS, and distributed computing systems
Familiarity with cluster networking and high-speed interconnects
Experience diagnosing performance issues across compute, networking, and storage layers
Strong troubleshooting and documentation skills

What is ideal?

Experience administering multi-node HPC clusters and supporting distributed workloads
Knowledge of parallel file systems (e.g., Lustre, BeeGFS, GPFS)
Experience with parallel computing frameworks (MPI, OpenMP)
Experience with configuration management tools (Ansible, Puppet)
Experience supporting GPU-enabled environments and CUDA workloads
Familiarity with hybrid HPC architectures (on-prem + cloud, e.g., AWS)
Experience supporting HPC systems in research, lab, or mission environments
Experience working in DoD or IC environments…