×
Register Here to Apply for Jobs or Post Jobs. X

System Administrator

Job in Faro, Yukon Territory, Canada
Listing for: MGIS Inc.
Full Time position
Listed on 2026-06-27
Job specializations:
  • IT/Tech
    Systems Administrator, IT Support, Unix/Linux
Job Description & How to Apply Below
Location: Faro

MGIS is seeking a System Administrator, Level 2, to manage High Performance Computing (HPC) clusters and support the scientists who rely on them. This role blends HPC system administration with hands‑on user support — helping researchers install, run, and debug applications on HPC infrastructure so they can focus on their science instead of IT issues.

HPC environments in scope include clustered CPU/GPU systems with job schedulers and attached parallel storage (e.g., Lustre, GPFS).

What you'll be doing
  • Maintain the HPC cluster — hardware, image management, local networking, scheduler, and backups
  • Troubleshoot environment incidents to ensure a quick return to normal operations
  • Meet with scientists to evaluate their HPC support requirements
  • Develop task plans to meet researchers' needs, consulting the technical authority for approval
  • Support application builds, installs, and runtime troubleshooting (GNU, Intel, Fortran, Nvidia)
  • Support open‑source and commercial software, including Python/Anaconda installs, Bash scripting, build/make tools, Easy Build, Spack, and MPI implementations (MPICH, OpenMPI, IntelMPI, HPMPI)
  • Assist with compilation and runtime of in‑house developed applications
  • General systems management:
    Linux OS patching schedules and reliability
  • Manage user accounts (creation, deletion) and environment modules
  • Manage configuration via Git, MS Dev Ops, and Ansible Playbooks
  • Manage RPM/DEB packages and troubleshoot Thin Linc
  • Troubleshoot jobs on schedulers (PBS Pro/Torque, SLURM, SGE)
  • Ensure reliable CUDA installs; troubleshoot GPU failures and CUDA software/driver issues
  • Provide hardware support — memory upgrades, storage arrays, power/network cabling, ILO
  • Document every process and task to support enterprise knowledge continuity
  • Submit weekly progress reports to the Technical Authority
Requirements
  • Solid experience administering Linux-based HPC clusters (CPU/GPU nodes, schedulers, parallel storage)
  • Hands‑on experience with job schedulers such as PBS Pro/Torque, SLURM, or SGE
  • Experience troubleshooting CUDA installations, GPU failures, and driver issues
  • Familiarity with scientific computing tool chains — compilers (GNU, Intel), MPI implementations, Easy Build, and Spack
  • Experience supporting researchers or end‑users with application builds and runtime issues
  • Working knowledge of configuration management tools (Git, Ansible, MS Dev Ops)
  • Comfortable working independently and producing clear technical documentation
  • Eligible to obtain and maintain a Secret‑level security clearance
#J-18808-Ljbffr
Note that applications are not being accepted from your jurisdiction for this job currently via this jobsite. Candidate preferences are the decision of the Employer or Recruiting Agent, and are controlled by them alone.
To Search, View & Apply for jobs on this site that accept applications from your location or country, tap here to make a Search:
 
 
 
Search for further Jobs Here:
(Try combinations for better Results! Or enter less keywords for broader Results)
Location
Increase/decrease your Search Radius (miles)
0
200
Filters
Education Level
Experience Level (years)
Posted in last:
Salary