×
Register Here to Apply for Jobs or Post Jobs. X

HPC Engineer

Job in Somerville, Middlesex County, Massachusetts, 02145, USA
Listing for: DeWinter Group
Contract position
Listed on 2025-12-07
Job specializations:
  • IT/Tech
    Cloud Computing, Systems Engineer
Job Description & How to Apply Below

Title: HPC Engineer

Job Type: Contract

Contract Length: 6-7 Month Contract (with potential for extension)

Target

Start Date:

January

Work Location/Structure: Remote (local to the Northeast or Midwest preferred)

About the Opportunity

Our client, a leader in Academic Research and Higher Education, is looking for a skilled HPC Engineer to join their team for a 6-7 month contract engagement. This project involves scaling and maintaining a critical High-Performance Computing (HPC) ecosystem used by university researchers for parallel processing, AI/ML applications, and massive data transfers. This is a high-impact role that requires a self‑motivated, tenured professional who can immediately contribute to the stability and efficiency of a complex, large‑scale research computing environment.

Key Responsibilities & Deliverables
  • Maintain the entire HPC ecosystem, including system specification, provisioning, OS installation (Rocky Linux), and managing updates/changes to approximately 200 Linux systems. This includes login/file transfer nodes, compute nodes, job schedulers (Slurm), and virtualization (VMware).
  • Utilize configuration management and security best practices to maintain all systems using Ansible and the Werewolf cluster management system.
  • Manage the Globus data transfer software and support the storage team with Vast and True Nad Storage maintenance. Provide support for data indexing tools like Starburst.
  • Maintain and support user‑facing HPC web gateways and research tools (e.g., Open OnDemand, Jupyter Notebook/Lab/Hub, FastX, OpenXDMod).
  • Respond to outage/urgent system issues and develop/document continual operational improvements in the HPC system administration service. Assist with vendor management as needed.
Required Skills & Experience
  • 5+ years of experience in a similar role within a large‑scale enterprise or research environment, with a "tenured" approach to system administration.
  • Deep expertise in Linux Systems Administration, Ansible, and HPC cluster management tools like Werewolf and the Slurm job scheduler. This isn't a learning role—you need to be a subject matter expert.
  • Demonstrated ability to work autonomously and manage your own time effectively to meet project goals and handle critical system issues.
  • Experience installing and maintaining common research computing frameworks and software, particularly AI/ML/DL libraries (Tensor Flow, PyTorch) and container platforms.
  • Familiarity with high‑performance storage solutions like Vast Storage and True Nad Storage, and experience with Globus or a strong willingness to quickly learn.
  • Strong communication skills to provide clear and concise status updates to the project team and technical expertise regarding network, storage administration, and data center issues.
  • Scripting proficiency in Shell or Python is a plus.
#J-18808-Ljbffr
To View & Apply for jobs on this site that accept applications from your location or country, tap the button below to make a Search.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)

Job Posting Language
Employment Category
Education (minimum level)
Filters
Education Level
Experience Level (years)
Posted in last:
Salary