Research Platform Engineer; HPC Job Richmond area,BC Canada,IT/Tech

Position: Research Platform Engineer (HPC)

About Us:

Established in 2002, General Fusion is a global leader in the race to commercialize clean fusion energy. We are pursuing a uniquely practical approach, Magnetized Target Fusion, and aim to provide zero-carbon fusion power to the grid in the early to mid-2030s. Today at our state-of-the-art labs in Richmond, BC, we’re operating a groundbreaking fusion demonstration machine called Lawson Machine 26 (LM26), designed to achieve transformational technical milestones and accelerate General Fusion’s technology to commercialization.

Our path to market is funded by a global syndicate of leading energy venture capital firms, industry leaders, and technology pioneers. Learn more at

Position Overview:

General Fusion research relies heavily on experimental data and computer simulation to design and operate its experimental devices. We’re seeking a versatile technical lead to support the infrastructure that empowers our scientists, including managing our High-Performance Computing (HPC) environment, and contributing to our research data infrastructure.

This is a dual role: as the HPC Administrator, half of your time will be spent ensuring our computer cluster is stable, optimized, and serving the needs of the science teams. The system runs Rocky Linux and comprises 70 computer nodes and 1PB of storage. The other half of your time will be spent contributing to our on-prem data systems that transform and serve our experimental data, with a focus on moving toward modern data architecture patterns and technologies.

This role will help shape the computational research infrastructure at a scientific R&D startup. You'll have opportunities to propose architectural changes, reduce complexity, and build out systems that directly accelerate scientific discovery. If you're energized by working at the intersection of infrastructure, data, and scientific computing, this role is for you.

Responsibilities:

Act as the primary source of HPC expertise within General Fusion

Cluster administration, including maintaining the OS and software environment, resource provisioning and allocation, managing the job scheduler (SLURM), user account management, and monitoring system health and performance

Provide training and support for HPC users

Collaborate with IT on networking and physical infrastructure; ensure alignment with IT policies, security standards, and corporate governance requirements, including applicable SOX controls

Design high-performance data architectures for storage, retrieval and analysis of complex research datasets; contribute to data versioning, result reuse, and metadata cataloging systems

Contribute to the modernization of data processing pipelines, with an eye toward simplification and maintainability

Proactive monitoring of system health and performance, across both compute nodes and data pipelines

Seek opportunities to consolidate tooling and reduce operational overhead

Act as a bridge between traditional HPC computing and modern data platform patterns, helping integrate simulation data with experimental data systems

Maintain and improve technical documentation

Contribute to strategic planning and decision-making to help drive the evolution of General Fusion’s data systems

Requirements:

Degree in Computer Science, Computer Engineering, Engineering Physics or related field

5+ years professional experience in an applied R&D environment, working in scientific computing and/or research data infrastructure.

2+ years of experience managing HPC clusters, with a solid understanding of Infini Band, MPI/parallel computing concepts, storage architectures, and workload scheduling (SLURM)

2+ years of platform or data engineering, specifically building systems that serve technical users

Experience across the modern Linux systems lifecycle, including OS administration (e.g. Rocky, Ubuntu, RHEL), container orchestration (Apptainer/Singularity, Docker), and declarative infrastructure to ensure environment reproducibility

Proficiency in low-level resource management (CPU/memory/IO) and system-level performance tuning

Experience implementing alerting, logging, and monitoring tools to track system health and performance (Prometheus,…


Increase/decrease your Search Radius (miles)



Job Posting Language