Torque frontend via jobcli#
What is Torque#
Torque is a resource manager for submitting and managing jobs on an HPC cluster. It is an implementation of PBS (Portable Batch System).
Torque is not widely used anymore, so the HPC-UGent infrastructure no longer uses Torque in the backend since 2021 in favor of Slurm.
The Torque user interface, which consists of commands like qsub
and qstat
, was kept however, to avoid that researchers had to learn other commands to submit and manage jobs.
Slurm backend#
Slurm is a resource manager for submitting and managing jobs on an HPC cluster, similar to Torque (but more advanced/modern in some ways). Currently, Slurm is the most popular workload manager on HPC systems worldwide, but it has a user interface that is different and in some sense less user friendly than Torque/PBS.
jobcli#
Jobcli is a Python library that was developed by HPC-UGent team to make it possible for the HPC-UGent infrastructure to use a Torque frontend and a Slurm backend. In addition to that, it adds some additional options for Torque commands. Put simply, jobcli can be thought of as a Python script that "translates" Torque commands into equivalent Slurm commands, and in the case of qsub
also makes some changes to the provided job script to make it compatible with Slurm.
Additional options for Torque commands supported by jobcli#
help option#
Adding --help
to a Torque command when using it on the HPC-UGent infrastructure will output an extensive overview of all supported options for that command, including all possible options for that command (including the original ones from Torque and the ones added by jobcli) and a short description for each one.
For example:
$ qsub --help
usage: qsub [--version] [--debug] [--dryrun] [--pass OPTIONS] [--dump PATH]...
Submit job script
positional arguments:
script_file_path Path to job script to be submitted (default: read job
script from stdin)
optional arguments:
-A ACCOUNT Charge resources used by this job to specified account
...
dryrun option#
Adding --dryrun
to a Torque command when using it on the HPC-UGent infrastructure will show the user what Slurm commands are generated by that Torque command by jobcli. Using --dryrun
will not actually execute the Slurm backend command.
See also the examples below.
debug option#
Similarly to --dryrun
, adding --debug
to a Torque command when using it on the HPC-UGent infrastructure will show the user what Slurm commands are generated by that Torque command by jobcli. However in contrast to --dryrun
, using --debug
will actually run the Slurm backend command.
See also the examples below.
Examples#
The following examples illustrate the working of the --dryrun
and --debug
options with an example jobscript.
example.sh
:
#/bin/bash
#PBS -l nodes=1:ppn=8
#PBS -l walltime=2:30:00
module load SciPy-bundle/2023.11-gfbf-2023b
python script.py > script.out.${PBS_JOBID}
Example of the dryrun option#
Running the following command:
$ qsub --dryrun example.sh -N example
will generate this output:
Command that would have been run:
---------------------------------
/usr/bin/sbatch
Job script that would have been submitted:
------------------------------------------
#!/bin/bash
#SBATCH --chdir="/user/gent/400/vsc40000"
#SBATCH --error="/kyukon/home/gent/400/vsc40000/examples/%x.e%A"
#SBATCH --export="NONE"
#SBATCH --get-user-env="60L"
#SBATCH --job-name="example"
#SBATCH --mail-type="NONE"
#SBATCH --nodes="1"
#SBATCH --ntasks-per-node="8"
#SBATCH --ntasks="8"
#SBATCH --output="/kyukon/home/gent/400/vsc40000/examples/%x.o%A"
#SBATCH --time="02:30:00"
### (start of lines that were added automatically by jobcli)
#
# original submission command:
# qsub --dryrun example.sh -N example
#
# directory where submission command was executed:
# /kyukon/home/gent/400/vsc40000/examples
#
# original script header:
# #PBS -l nodes=1:ppn=8
# #PBS -l walltime=2:30:00
#
### (end of lines that were added automatically by jobcli)
#/bin/bash
module load SciPy-bundle/2023.11-gfbf-2023b
python script.py > script.out.${PBS_JOBID}
#SBATCH
since these contain the translation of the Torque commands to Slurm commands. For example the job-name is the one we specified with the -N
option in the command.
With this dryrun, you can see that the only changes were made to the header, the job script itself is not changed at all. If the job script were to use any PBS-related structures, like $PBS_JOBID
, they are retained. Slurm is configured such on the HPC-UGent infrastructure that common PBS_*
environment variables are defined in the job environment, next to the Slurm equivalents.
Example of the debug option#
Similarly to the --dryrun
example, we start by running the following command:
$ qsub --debug example.sh -N example
which generates this output:
DEBUG: Submitting job script location at example.sh
DEBUG: Generated script header
#SBATCH --chdir="/user/gent/400/vsc40000"
#SBATCH --error="/kyukon/home/gent/400/vsc40000/examples/%x.e%A"
#SBATCH --export="NONE"
#SBATCH --get-user-env="60L"
#SBATCH --job-name="example"
#SBATCH --mail-type="NONE"
#SBATCH --nodes="1"
#SBATCH --ntasks-per-node="8"
#SBATCH --ntasks="8"
#SBATCH --output="/kyukon/home/gent/400/vsc40000/examples/%x.o%A"
#SBATCH --time="02:30:00"
DEBUG: HOOKS: Looking for hooks in directory '/etc/jobcli/hooks'
DEBUG: HOOKS: Directory '/etc/jobcli/hooks' does not exist, so no hooks there
DEBUG: Running command '/usr/bin/sbatch'
64842138