Computing; HPC Systems Architect
Houston, Harris County, Texas, 77246, USA
Listed on 2026-06-02
-
IT/Tech
Cloud Computing, Data Engineer
Job Description
High Performance Computing (HPC) Systems Architect
Job #: 3035781
Location:
Houston, Texas (Partial Remote)
Employment Type:
Contract to Perm
An organization is deploying a new High Performance Computing cluster and seeks an HPC-focused professional to administer, support, and enable research workloads on this environment. The role focuses on HPC environment management and research support, acting as the primary owner and advocate for the HPC environment and its users within a team with strong Linux system administration expertise. Experience with researchers or data scientists, ideally in an academic, public health, or scientific research context, is required.
Key Responsibilities- Manage, monitor, and maintain the new HPC cluster, including compute, GPU, high‑memory, hybrid, and management nodes.
- Oversee and optimize the Slurm job scheduler, including configuration, policies, queues, troubleshooting user jobs, and performance tuning.
- Operate and support Tier 1 storage (Pix Store) and its integration with Tier 2 storage (Dell EMC Isilon/Power Scale).
- Act as a technical liaison between IT and the research community, supporting researchers and data scientists with onboarding, software configuration, and workload optimization.
- Translate research needs into practical workflows on the cluster, providing guidance on best practices for running jobs and managing data.
- Develop user-facing documentation, quick‑start guides, and FAQs for researchers and data scientists.
- Deliver trainings, workshops, and onboarding sessions to help users learn command‑line basics, use scientific tools, and manage jobs via Slurm.
- Collaborate with internal teams, faculty liaisons, and external vendors on support, enhancements, and long‑term planning for HPC services.
- Hands‑on experience with HPC environments.
- Experience working with researchers or data scientists in an academic, public health, or scientific research context.
- Comfortable working on site with physical hardware and data center environments.
- Strong Linux experience in a server or cluster environment.
- Practical experience with job schedulers, specifically Slurm (configuration, job submission, troubleshooting, and optimization).
- Ability to understand and support software commonly used in research/HPC, such as Python‑based workflows and scientific libraries, and to communicate technical concepts clearly to non‑experts.
- Heavy background as a data scientist or in a research computing support role.
- Experience with Pix Store or similar high‑performance storage systems.
- Experience with Dell EMC Isilon / Power Scale or other large‑scale NAS platforms.
- Familiarity with Bash, shell scripting, and scientific Python ecosystems.
- Prior experience designing or managing HPC clusters and delivering user training.
- Experience in higher education, healthcare, or public health research environments.
This position follows a hybrid work model with a mix of on‑site and remote work, typically three days on‑site and two days remote, with flexibility. The candidate must be available to come on‑site as needed for data center access, physical hardware issues, and vendor visits. The schedule consists of standard daytime hours with some flexibility required for urgent issues affecting research workloads.
Compensation& Benefits
A comprehensive benefits package is available to eligible employees, including a pension program and long‑term health insurance benefits that vest over time. The position offers a potential growth path toward a Senior Systems Architect role for a strong, long‑term performer.
Equal Opportunity EmployerThis employer is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or protected veteran status and will not be discriminated against on the basis of disability.
#J-18808-Ljbffr(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).