×
Register Here to Apply for Jobs or Post Jobs. X

HPC Systems Engineer

Job in Charlottesville, Albemarle County, Virginia, 22904, USA
Listing for: SAIC
Full Time position
Listed on 2026-04-17
Job specializations:
  • IT/Tech
    Systems Engineer, Cloud Computing
Salary/Wage Range or Industry Benchmark: 60000 - 80000 USD Yearly USD 60000.00 80000.00 YEAR
Job Description & How to Apply Below

Overview

SAIC is looking for a highly qualified HPC Systems Engineer to support the Army's Golden Dome initiative. The engineer will support the deployment and sustainment of Linux-based High Performance Computing (HPC) cluster environments used for distributed compute workloads, simulation environments, and GPU-enabled processing.

The environment will include:

  • multi-node Linux compute clusters
  • workload scheduling platforms such as Slurm or PBS
  • cluster provisioning frameworks (e.g., xCAT, Warewulf)
  • high-performance networking technologies including RDMA / Infini Band
  • distributed parallel compute workloads utilizing MPI or OpenMP
  • GPU-enabled compute resources supporting CUDA-based processing

The system will be used to support scientific computing, simulation workloads, and other distributed compute operations within a secure research environment.

Candidates should be comfortable working within cluster-scale computing environments where performance, scheduler configuration, and distributed workload execution are critical operational factors.

The HPC Systems Engineer will support the build-out, configuration, and sustainment of HPC cluster platforms.

Responsibilities
  • cluster platform configuration
  • scheduler administration
  • distributed compute troubleshooting
  • performance analysis across compute, storage, and network layers
  • GPU compute workload support
  • automation and operational tooling

Candidates should have experience working with multi-node Linux cluster environments and distributed compute workloads.

Core Technical Capabilities

HPC Cluster Platforms

Experience supporting multi-node Linux compute clusters, including node integration, configuration, and operational sustainment.

Experience with cluster provisioning tools such as xCAT, Warewulf, or similar node deployment systems is beneficial.

Workload Scheduling Platforms

Experience supporting distributed compute workloads using schedulers such as:

  • Slurm
  • PBS / PBS Pro
  • Torque
  • Grid Engine

Candidates should understand queue configuration, job submission workflows, and scheduler troubleshooting.

Candidates should understand how workload schedulers interact with distributed compute workloads and containerized execution environments.

Linux Systems Administration

Strong Linux administration experience including:

  • command-line system administration
  • server and compute node configuration
  • system troubleshooting in distributed compute environments

Experience with RHEL-based environments is preferred.

Distributed and Containerized Workloads

Experience supporting distributed compute workloads utilizing parallel computing frameworks such as:

  • MPI
  • OpenMP
  • GPU compute frameworks

Candidates should understand how workload schedulers interact with distributed compute workloads and containerized execution environments within HPC clusters.

Familiarity with container technologies commonly used in HPC environments such as:

  • Docker
  • Podman
  • Singularity / Apptainer

Candidates should understand how containerized workloads interact with schedulers, GPU resources, and distributed compute environments.

Experience supporting containerized HPC workloads or integrating container platforms with cluster infrastructure is desirable.

HPC Networking

Familiarity with high-performance networking technologies including:

  • RDMA networking
  • Infini Band
  • high-throughput cluster networking architectures

Candidates should be comfortable assisting with troubleshooting cluster communication or performance issues.

GPU Compute Environments

Experience supporting GPU-enabled compute environments and workloads utilizing CUDA frameworks is desirable.

Automation and Operational Tooling

Experience writing scripts or operational tooling using languages such as:

  • Bash
  • Python

Automation experience supporting system administration or cluster operations is beneficial.

Qualifications

Candidates must meet the following requirements:

  • Bachelor degree in science/technology; 10 additional YoE can be substituted for degree
  • 8+ years of experience is required
  • Minimum 6 years of experience administering Linux systems in enterprise, research computing, or distributed compute environments
  • An Active Top Secret clearance is required; an active TS/SCI clearance must be obtained prior to…
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary