Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things

Content

Automatically requeuing of jobs if not enough licenses are available

A common problem with queuing systems and commercial software using floating licenses is that you cannot easily guarantee that the licenses you need are available when your job starts. Some queuing systems and schedulers can consider license usage – the solution at RRZE does not (at least not reliably).

A partial solution (although by far not optimal) is outlined below. With effectively two additional lines in your job script you can at least ensure that your job gets requeued if not enough licenses are available – and does not just abort. (The risk for race conditions which are not detected of course still exists, and you may have to wait again some time until compute resources are available for your new jobs … but better than only seeing the error message after the weekend …

[shell]
#!/bin/bash -l
#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -N myjob

# it is important that “bash” is executed on the first line above!
#
# check for 16 hpcdomains and 1 starpar license and automatically
# requeue the job if not enough licenses are available right now.
# This check is based on the situation right now – it may
# change just in the next second, thus, there is no guarantee
# that the license is still available in just a few moments.
# We do not checkout, borrow or reserve anything here!
# CHANGE license server and feature list according to your needs!
# instead of $CDLMD_LICENSE_FILE you can use the PORT@SERVER syntax
/apps/rrze/bin/check_lic.sh -c $CDLMD_LICENSE_FILE hpcdomains 16 starpar 1

# the next line must follow immediately after the check_lic.sh line
# with no commands in between!
# (the “.” at the beginning is also correct and important)
. /apps/rrze/bin/check_autorequeue.sh

# now continue with your normal tasks …
# if there were not enough licenses in the preliminary check,
# the script will not come until here but it got requeued.
[/shell]

This approach is not at all limited to STAR-CD and should work on Cluster32 and Woody.

 

ATTENTION: this approach does NOT work if license throttling is active, i.e. in cases where licenses are in principle available but the license server limits the number of licenses you or your group may get by using some MAX setting in the option file on the license server!

Most licenses at RRZE are throttled, thus, the check_lic.sh and check_autorqueue.sh scripts are of limited use only these days.