Title : Senior Computing System Administrator
Listed on 2025-12-02
-
IT/Tech
Systems Engineer, IT Support, Cloud Computing, Systems Administrator
Salary Range
$90,000.00 - $
OverviewWorking at Yale means contributing to a better tomorrow. Whether you are a current resident of our New Haven-based community- eligible for opportunities through the New Haven Hiring Initiative or a newcomer, interested in exploring all that Yale has to offer, your talents and contributions are welcome. Discover your opportunities at Yale! The Yale Center for Research Computing (YCRC) is looking for a versatile system administrator/engineer to help ensure that Yale’s exceptional faculty and students have the AI HPC infrastructure they need to propel discovery and scholarship to improve the world.
Join our growing team of system specialists, research facilitators, and project administration experts, focusing your work especially on GPU infrastructure enhancements and improvements as part of Yale’s comprehensive campus investment in AI.
As an experienced subject matter expert, you will help lead the system design, deployment and support of YCRC’s AI-focused research cluster and storage infrastructure. This role is primarily systems-facing, but has a researcher-facing component as well. Frequent interaction with other systems team members, research support specialists, and researchers is a routine part of the job. You will be expected to stay current on developments and trends in accelerator and overall high performance computing technologies, processes, and methodologies.
We will look to you for insights on evolving tradeoffs in areas such as accelerator-based memory, precision, interconnects, power consumption, and cost.
This is a hybrid position, with a minimum of two days per week on site. YCRC’s office space is on the Yale campus. As part of the systems team, you will be expected to provide on-site equipment maintenance as needed. Infrastructure is hosted at a Yale data center in West Haven, CT, and at the Massachusetts Green High Performance Computing Center (MGHPCC) in Holyoke, MA.
RequiredSkills and Abilities
- Experience with accelerators such as GPUs for AI, including expertise with system-level tradeoffs in such areas as accelerator-based memory, precision, within-node interconnect, multi-node interconnect, cost and power consumption.
- Expertise in administration of HPC Linux clusters, including managing and configuring cluster provisioning and management tools, and batch scheduler.
- Experience with high-speed networking such as Infini Band and high-speed Ethernet.
- Experience with large storage systems and parallel file systems such as GPFS and Lustre.
- Expertise in Linux system administration, including managing the operating system, networking, storage, and security.
- Expertise in automation and scripting in at least one scripting language.
- Ability to work in a team environment in a fast-moving technology field. Excellent verbal and writing skills.
- Ability to interact well with team members and end users. Ability to work independently and across units.
- Attention to detail. Ability to take the care necessary to be entrusted with a system that hundreds of users depend on for research computation and the storage of research data.
Skills and Abilities
- Demonstrated ability to specify, install, configure, and support multi-node GPU systems, and tune them for AI applications.
- Demonstrated ability to design, implement, and maintain a local, customized implementation and configuration of a core HPC system such as the HPC provisioning system, the resource-management system, account/user lifecycle management, or user authentication and authorization systems.
- Experience supporting technology in a research environment.
- Expertise in configuration, deployment, support, and backup of large-scale parallel storage systems.
- Experience administering high-speed networking such as Infini Band or high-speed Ethernet in a cluster environment.
- Expertise in computer security, preferably in the context of large, multi-user Linux environments.
- Experience in a data-center environment, installing and trouble‑shooting hardware.
- Professional certifications related to the above.
- Graduate degree in a related field.
(If this job is in fact in your jurisdiction, then you may be using a Proxy or VPN to access this site, and to progress further, you should change your connectivity to another mobile device or PC).