Graphics Processing Unit; GPU Engineer - TS/SCI
Listed on 2026-06-18
-
IT/Tech
Systems Engineer
Location:
Bethesda, MD
Category:
Systems Engineering
Travel Required:
No
Remote Type:
No
Clearance: TS/SCI
Sunayu, LLC is looking for a highly skilled Systems Engineer with deep expertise in operating systems, hardware, GPU, and high-speed networking. In this role, you will design, develop, and optimize GPU clusters that power enterprise AI for the mission customers.
This is a 100% on-site position.
Responsibilities- GPU Cluster Engineering:
Design, configure, and maintain GPU clusters. Collaborate with a multidisciplinary team to define and optimize architectures, ensuring they meet performance, power efficiency, and feature requirements. - Operating System Integration:
Work closely with AI/ML engineers to ensure smooth GPU integration with Linux-based systems. Optimize GPU drivers for compatibility, reliability, and performance. Provide regular maintenance and updates. - Performance Optimization:
Analyze GPU performance, identify bottlenecks, and develop strategies to improve efficiency across hardware and software layers. - Tooling and Automation:
Build and maintain debugging tools, profiling utilities, and performance analysis software for Linux environments. Leverage scripting and configuration tools such as Bash, Python, Ansible, Puppet, and Salt. - Compliance & Documentation:
Maintain technical documentation, architectural specifications, and Linux best practices. Support ATO (Authority to Operate) and ensure compliance with federal security standards.
- Bachelor's or higher degree in Computer Science, Computer Engineering, Electrical Engineering, or a related field with at least 12 years of related technical experience. Additional years of experience may be considered in lieu of a degree.
- 10+ years of relevant systems engineering experience.
- Experience in managing NVIDIA GPU data center platforms (DGX, HGX, H200, H100, L4s).
- Knowledge of enterprise server components (storage/network controllers, HBA, SSDs).
- Strong expertise with Linux distributions (RHEL, Ubuntu, Oracle, and Rocky).
- Excellent problem-solving skills and ability to collaborate within a team.
- Candidate must meet DoD 8570.11-IAT Level II certification requirements (Security+ CE, CCNA-Security, GICSP, GSEC, SSCP with appropriate computing environment CE) or have an IAT Level III certification (CASP+, CCNP Security, CISA, CISSP, GCED, GCIH, CCSP).
Active TS/SCI clearance with Polygraph required OR active TS/SCI and willingness to obtain and maintain a Poly.
US Citizenship is required due to the nature of the government contracts we support.
Preferred Qualifications- Experience with Kubernetes cluster management and AI/ML workflow orchestration (Argo, Airflow, and Kubeflow).
- Familiarity with GPU virtualization and cloud computing.
- Experience with Prometheus/Grafana for monitoring.
- Knowledge of distributed resource scheduling systems (Slurm (preferred), LSF, etc.).
Salary range considers factors such as (but not limited to) scope and responsibilities of the position, candidate's work experience, education/training, key skills, as well as market and business considerations when extending an offer.
Benefits- 3 Medical Plan Options
- Dental and Vision
- FSA, DCFSA, HSA
- Life/AD&D Insurance
- Short-Term & Long-Term Disability
- Employee Assistance Program (EAP)
- Training and Educational Assistance Paid Time Off (PTO)
- 11 Federal holidays
- 401k plan with up to a 6% match (100% immediate vesting)
Sunayu, LLC is an Equal Opportunity/Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, gender expression, national origin, age, protected veteran status, disability status, marital status, genetic information, medical condition, or any other characteristic protected by law.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).