GPU Systems Engineer
Job in
Bethesda, Montgomery County, Maryland, 20814, USA
Listed on 2026-07-01
Listing for:
Base-2 Solutions, LLC
Full Time
position Listed on 2026-07-01
Job specializations:
-
IT/Tech
Systems Engineer, Cloud Computing: Infrastructure & Operations, Unix/Linux
Job Description & How to Apply Below
Position Summary
Support enterprise AI mission systems by designing, developing, and optimizing GPU clusters, with deep focus on operating systems, hardware, GPU platforms, and high-speed networking in a secure customer environment.
Essential Duties and Responsibilities- Design, configure, and maintain GPU clusters.
- Collaborate with a multidisciplinary team to define and optimize architectures for performance, power efficiency, and required features.
- Work closely with AI/ML engineers to integrate GPUs with Linux-based systems.
- Optimize GPU drivers for compatibility, reliability, and performance.
- Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers.
- Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments.
- Leverage Bash, Python, Ansible, Puppet, and Salt for tooling and automation.
- Maintain technical documentation, architectural specifications, and Linux best practices.
- Support ATO activities and ensure compliance with federal security standards.
- Active TS/SCI with ability to obtain a CI Polygraph.
- Bachelor's degree with a minimum of six years of experience in the category field. Three additional years of experience may be substituted for the bachelor's degree.
- Experience managing NVIDIA GPU data center platforms, including DGX, HGX, H200, H100, and L4s.
- Knowledge of enterprise server components, including storage/network controllers, HBAs, and SSDs.
- Strong expertise with Linux distributions, including RHEL, Ubuntu, Oracle, and Rocky.
- Excellent problem-solving skills and the ability to collaborate within a team.
- Meet DoD 8570.11 IAT Level II certification requirements at a minimum; IAT Level III is also acceptable.
- U.S. citizenship is required due to the nature of the government contracts supported.
- Experience with Kubernetes cluster management and AI/ML workflow orchestration, including Argo, Airflow, and Kubeflow.
- Familiarity with GPU virtualization and cloud computing.
- Experience with Prometheus and Grafana for monitoring.
- Knowledge of distributed resource scheduling systems such as Slurm, LSF, or similar tools.
Education and Experience Equivalency Required Certifications
- DoD 8570.11 IAT Level II certification:
Security+ CE, CCNA-Security, GICSP, GSEC, or SSCP.
- Active TS/SCI with ability to obtain a CI Polygraph.
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×