Senior HPC Engineer
Job in
Mountain View, Santa Clara County, California, 94039, USA
Listed on 2026-06-02
Listing for:
ASRC Federal Holding Company
Full Time
position Listed on 2026-06-02
Job specializations:
-
IT/Tech
IT Support, Systems Engineer, Systems Administrator, Cloud Computing
Job Description & How to Apply Below
Key Responsibilities:
Design, deploy and maintain HPC clusters with over 2000+ nodes with Infini Band, 100+ petabytes of data storage in production.
Shepherd and/or contribute to scalable feature designs through the entire software development process, from requirements and use cases to release
Designs and develops scripts for system administration, monitoring and usage reporting.
Modify existing software to correct errors and/or improve performance
Designs and develops scripts for system regression test and performance (file systems (Luster), scheduler (PBS), interconnect (HDR/NDR, Slingshot, ), high availability, etc.).
Troubleshoots, isolates and resolves application, system and other technical problems (hardware, software, and network).
Understands research use cases, researches and deploys new technologies, defining cost, performance and other trade-offs.
Manages and maintains tools for provisioning, configuration management (HPCM, Ansible & GIT), resource management, scheduling and all necessary aspects of HPC in accordance with best practices.
Researches, deploys and manages networking and security infrastructure, including development of policies and procedures.
Assists in developing and writing proposals and publications.
Creates and provides clear documentation.
Mentoring junior staff and cross training peers
After hours/weekend support as required
Moderate Supercomputing System Administration that contributes to:
Day-to-day operations of the Linux HPC clusters and storage systems
Proactive monitoring, analyze, and correct system issues
Development of scripts to automate repetitive tasks or tools to enhance support of the HPC systems
System performance analysis and tuning
Building, installing, and supporting user-requested software
Supporting evaluation and assessment of new HPC technology
Resolving user report issues and manage support tickets requests in Remedy
Requirements:
Bachelor's degree in computer science or related field
Strong computer science background with in-depth systems-level knowledge in operating systems and networking
A minimum of 10 years of experience in the administration of HPC systems and scheduling software (PBS, Slurm, or Moab/Torque)
A minimum of 10 years of experience of systems programming in heterogeneous, multi-platform HPC environments
Strong ability to analyze, debug and maintain the integrity of an existing code base
Demonstrated equivalence of 5 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systems
Experience working with HPC applications and proficiency in at least C, C++, or Fortran
Superior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or Bash
Strong ability to interact with customers to understand needs, elicit requirements, and get feedback on prototype solutions
Excellent communication and people skills; excellent time management and organizational skills
Experience with system configuration management tools e.g. , puppet, chef, ansible
Experience with revision control software e.g. CVS, SVN, Git
Track record of delivering commercial quality software on schedule with excellent quality through multiple release cycles
Proficiency at documentation and technical writing
Preferred
Skills:
Proficiency with analysis and problem-solving skills for debugging and optimization of applications
Familiarity/proficiency with OpenMP and Message Passing Interface (MPI) programming
Experience with Lustre, and Infini Band
Experience with cloud technologies (AWS, Azure, GCP), Open Stack or Kubernetes is a plus
We invest in the lives of our employees, both in and out of the workplace, by providing…
Position Requirements
10+ Years
work experience
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
Search for further Jobs Here:
×