Research Platform Engineer; HPC
Job in
Richmond, BC, Canada
Listing for:
General Fusion
Full Time
position
Listed on 2026-03-06
Job specializations:
-
IT/Tech
Data Engineer, Systems Engineer, Cloud Computing, Data Scientist
Job Description & How to Apply Below
Position: Research Platform Engineer (HPC)
About Us:
Established in 2002, General Fusion is a global leader in the race to commercialize clean fusion energy. We are pursuing a uniquely practical approach, Magnetized Target Fusion, and aim to provide zero-carbon fusion power to the grid in the early to mid-2030s. Today at our state-of-the-art labs in Richmond, BC, we’re operating a groundbreaking fusion demonstration machine called Lawson Machine 26 (LM26), designed to achieve transformational technical milestones and accelerate General Fusion’s technology to commercialization.
Our path to market is funded by a global syndicate of leading energy venture capital firms, industry leaders, and technology pioneers. Learn more at
Position Overview:
General Fusion research relies heavily on experimental data and computer simulation to design and operate its experimental devices. We’re seeking a versatile technical lead to support the infrastructure that empowers our scientists, including managing our High-Performance Computing (HPC) environment, and contributing to our research data infrastructure.
This is a dual role: as the HPC Administrator, half of your time will be spent ensuring our computer cluster is stable, optimized, and serving the needs of the science teams. The system runs Rocky Linux and comprises 70 computer nodes and 1PB of storage. The other half of your time will be spent contributing to our on-prem data systems that transform and serve our experimental data, with a focus on moving toward modern data architecture patterns and technologies.
This role will help shape the computational research infrastructure at a scientific R&D startup. You'll have opportunities to propose architectural changes, reduce complexity, and build out systems that directly accelerate scientific discovery. If you're energized by working at the intersection of infrastructure, data, and scientific computing, this role is for you.
Responsibilities:
Act as the primary source of HPC expertise within General FusionCluster administration, including maintaining the OS and software environment, resource provisioning and allocation, managing the job scheduler (SLURM), user account management, and monitoring system health and performanceProvide training and support for HPC usersCollaborate with IT on networking and physical infrastructure; ensure alignment with IT policies, security standards, and corporate governance requirements, including applicable SOX controlsDesign high-performance data architectures for storage, retrieval and analysis of complex research datasets; contribute to data versioning, result reuse, and metadata cataloging systemsContribute to the modernization of data processing pipelines, with an eye toward simplification and maintainabilityProactive monitoring of system health and performance, across both compute nodes and data pipelinesSeek opportunities to consolidate tooling and reduce operational overheadAct as a bridge between traditional HPC computing and modern data platform patterns, helping integrate simulation data with experimental data systemsMaintain and improve technical documentationContribute to strategic planning and decision-making to help drive the evolution of General Fusion’s data systems
Requirements:
Degree in Computer Science, Computer Engineering, Engineering Physics or related field5+ years professional experience in an applied R&D environment, working in scientific computing and/or research data infrastructure.2+ years of experience managing HPC clusters, with a solid understanding of Infini Band, MPI/parallel computing concepts, storage architectures, and workload scheduling (SLURM)2+ years of platform or data engineering, specifically building systems that serve technical usersExperience across the modern Linux systems lifecycle, including OS administration (e.g. Rocky, Ubuntu, RHEL), container orchestration (Apptainer/Singularity, Docker), and declarative infrastructure to ensure environment reproducibilityProficiency in low-level resource management (CPU/memory/IO) and system-level performance tuningExperience implementing alerting, logging, and monitoring tools to track system health and performance (Prometheus,…
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
Search for further Jobs Here: