Job script examples#

Simple job script template#

This is a template for a job script, with commonly used parameters. The basic parameters should always be used. Some notes on the situational parameters:

-l mem: If no memory parameter is given, the job gets access to an amount of memory proportional to the amount of cores requested. See also: Job failed: SEGV Segmentation fault
-m/-M: the -m option will send emails to your email address registerd with VSC. Only if you want emails at some other address, you should use the -M option.
Replace the "-placeholder text-" with real entries. This notation is used to ensure qsub rejects invalid options.
To use a situational parameter, remove one '#' at the beginning of the line.

simple_jobscript.sh

#!/bin/bash

# Basic parameters
#PBS -N jobname           ## Job name
#PBS -l nodes=1:ppn=2     ## 1 node, 2 processors per node (ppn=all to get a full node)
#PBS -l walltime=01:00:00 ## Max time your job will run (no more than 72:00:00)

# Situational parameters: remove one '#' at the front to use
##PBS -l gpus=1            ## GPU amount (only on accelgor or joltik)
##PBS -l mem=32gb          ## If not used, memory will be available proportional to the max amount
##PBS -m abe               ## Email notifications (abe=aborted, begin and end)
##PBS -M -email_address-   ## ONLY if you want to use a different email than your VSC address
##PBS -A -project-         ## Project name when credits are required (only Tier 1)

##PBS -o -filename-        ## Output log
##PBS -e -filename-        ## Error log


module load [module]
module load [module]

cd $PBS_O_WORKDIR         # Change working directory to the location where the job was submmitted

[commands]

Single-core job#

Here's an example of a single-core job script:

single_core.sh

#!/bin/bash
#PBS -N count_example         ## job name
#PBS -l nodes=1:ppn=1         ## single-node job, single core
#PBS -l walltime=2:00:00      ## max. 2h of wall time
module load Python/3.6.4-intel-2018a
# copy input data from location where job was submitted from
cp $PBS_O_WORKDIR/input.txt $TMPDIR
# go to temporary working directory (on local disk) & run
cd $TMPDIR
python -c "print(len(open('input.txt').read()))" > output.txt
# copy back output data, ensure unique filename using $PBS_JOBID
cp output.txt $VSC_DATA/output_${PBS_JOBID}.txt

Using #PBS header lines, we specify the resource requirements for the job, see Apendix B for a list of these options.
A module for Python 3.6 is loaded, see also section Modules.
We stage the data in: the file input.txt is copied into the "working" directory, see chapter Running jobs with input/output data.
The main part of the script runs a small Python program that counts the number of characters in the provided input file input.txt.
We stage the results out: the output file output.txt is copied from the "working directory" ($TMPDIR|) to a unique directory in $VSC_DATA. For a list of possible storage locations, see subsection Pre-defined user directories.

Multi-core job#

Here's an example of a multi-core job script that uses mympirun:

multi_core.sh

#!/bin/bash
#PBS -N mpi_hello             ## job name
#PBS -l nodes=2:ppn=all       ## 2 nodes, all cores per node
#PBS -l walltime=2:00:00      ## max. 2h of wall time
module load intel/2017b
module load vsc-mympirun      ## We don't use a version here, this is on purpose
# go to working directory, compile and run MPI hello world
cd $PBS_O_WORKDIR
mpicc mpi_hello.c -o mpi_hello
mympirun ./mpi_hello

An example MPI hello world program can be downloaded from https://github.com/hpcugent/vsc-mympirun/blob/master/testscripts/mpi_helloworld.c.

Running a command with a maximum time limit#

If you want to run a job, but you are not sure it will finish before the job runs out of walltime and you want to copy data back before, you have to stop the main command before the walltime runs out and copy the data back.

This can be done with the timeout command. This command sets a limit of time a program can run for, and when this limit is exceeded, it kills the program. Here's an example job script using timeout:

timeout.sh

#!/bin/bash
#PBS -N timeout_example
#PBS -l nodes=1:ppn=1        ## single-node job, single core
#PBS -l walltime=2:00:00     ## max. 2h of wall time

# go to temporary working directory (on local disk)
cd $TMPDIR
# This command will take too long (1400 minutes is longer than our walltime)
# $PBS_O_WORKDIR/example_program.sh 1400 output.txt

# So we put it after a timeout command
# We have a total of 120 minutes (2 x 60) and we instruct the script to run for
# 100 minutes, but timeout after 90 minute,
# so we have 30 minutes left to copy files back. This should
#  be more than enough.
timeout -s SIGKILL 90m $PBS_O_WORKDIR/example_program.sh 100 output.txt
# copy back output data, ensure unique filename using $PBS_JOBID
cp output.txt $VSC_DATA/output_${PBS_JOBID}.txt

The example program used in this script is a dummy script that simply sleeps a specified amount of minutes:

example_program.sh

#!/bin/bash
# This is an example program
# It takes two arguments: a number of times to loop and a file to write to
# In total, it will run for (the number of times to loop) minutes

if [ $# -ne 2 ]; then
    echo "Usage: ./example_program amount filename" && exit 1
fi

for ((i = 0; i < $1; i++ )); do
    echo "${i} => $(date)" >> $2
    sleep 60
done