To submit jobs to the
joltik GPU cluster, where each node provides 4
NVIDIA V100 GPUs (each with 32GB of GPU memory), use:
$ module swap cluster/joltik
To submit to the
accelgor GPU cluster, where each node provides 4
NVIDIA A100 GPUs (each with 80GB GPU memory), use:
$ module swap cluster/accelgor
Then use the familiar
qstat, etc. commands, taking into
account the guidelines outlined in
section Requesting (GPU) resources.
To interactively experiment with GPUs, you can submit an interactive job
qsub -I (and request one or more GPUs, see
section Requesting (GPU) resources).
Note that due to a bug in Slurm you will currently not be able to be able to interactively use MPI software that requires access to the GPUs. If you need this, please contact use via email@example.com.
Requesting (GPU) resources#
There are 2 main ways to ask for GPUs as part of a job:
Either as a node property (similar to the number of cores per node specified via
-l nodes=X:ppn=Y:gpus=Z(where the
ppn=Yis optional), or as a separate resource request (similar to the amount of memory) via
-l gpus=Z. Both notations give exactly the same result. The
-l gpus=Zis convenient is you only need one node and you are fine with the default number of cores per GPU. The
-l nodes=...:gpus=Znotation is required if you want to run with full control or in multinode cases like MPI jobs. If you do not specify the number of GPUs by just using
-l gpus, you get by default 1 GPU.
As a resource of it's own, via
--gpus X. In this case however, you are not guaranteed that the GPUs are on the same node, so your script or code must be able to deal with this.
The GPUs are constrained to the jobs (like the CPU cores), but do not run in so-called "exclusive" mode.
The GPUs run with the so-called "persistence daemon", so the GPUs is not re-initialised between jobs.
Some important attention points:
For MPI jobs, we recommend the (new) wrapper
pmiis the background mechanism to start the MPI tasks, and is different from the usual
mpirunthat is used by the
mympirunwrapper). At some later point, we might promote the
mypmiruntool or rename it, to avoid the confusion in the naming).
Sharing GPUs requires MPS. The Slurm built-in MPS does not really do want you want, so we will provide integration with
For parallel work, we are working on a
wurkerwrapper from the
vsc-mympirunmodule that supports GPU placement and MPS, without any limitations wrt the requested resources (i.e. also support the case where GPUs are spread heterogenous over nodes from using the
wurkerwill try to do the most optimised placement of cores and tasks, and will provide 1 (optimal) GPU per task/MPI rank, and set one so-called visible device (i.e.
CUDA_VISIBLE_DEVICESonly has 1 ID). The actual devices are not constrained to the ranks, so you can access all devices requested in the job. We know that at this moment, this is not working properly, but we are working on this. We advise against trying to fix this yourself.
Software with GPU support#
module avail to check for centrally installed software.
The subsections below only cover a couple of installed software packages, more are available.
module avail GROMACS for a list of installed versions.
Horovod can be used for (multi-node) multi-GPU TensorFlow/PyTorch calculations.
module avail Horovod for a list of installed versions.
Horovod supports TensorFlow, Keras, PyTorch and MxNet (see
https://github.com/horovod/horovod#id9), but should be run as an MPI
mypmirun. (Horovod also provides it's own wrapper
horovodrun, not sure if it handles placement and others correctly).
At least for simple TensorFlow benchmarks, it looks like Horovod is a bit faster than usual autodetect multi-GPU TensorFlow without horovod, but it comes at the cost of the code modifications to use horovod.
module avail PyTorch for a list of installed versions.
module avail TensorFlow for a list of installed
Note: for running TensorFlow calculations on multiple GPUs and/or on more than one workernode, use
Horovod, see section Horovod.
Example TensorFlow job script#
#PBS -l walltime=5:0:0
#PBS -l nodes=1:ppn=quarter:gpus=1
module load TensorFlow/2.6.0-foss-2021a-CUDA-11.3.1
module avail AlphaFold for a list of installed
For more information on using AlphaFold, we strongly recommend the VIB-UGent course available at https://elearning.bits.vib.be/courses/alphafold.
In case of questions or problems, please contact the HPC-UGent team via firstname.lastname@example.org, and clearly
indicate that your question relates to the
joltik cluster by adding
[joltik] in the email subject.