Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things

Content

Running STAR-CCM+ jobs on Woody

Running STAR-CCM+ jobs on Woody

We now have a first user who is using STAR-CCM+ in parallel on the Woody cluster. Starting jobs in batch mode seems to be quite easy. As STAR-CCM+ internally uses HP-MPI, Infiniband should automatically be used correctly, too (although I did not explicitly verify this yet).

Here is what this user currently uses (again no idea if automatic stopping actually works with STAR-CCM+, thus, there might be room for improvements):

[shell]
#!/bin/bash -l
#PBS -l nodes=2:ppn=4
#PBS -l walltime=24:00:00
#PBS -N some-jobname

cd $PBS_O_WORKDIR

module add star-ccm+/3.04.008

# specify the time you want to have to save results, etc.
export TIME4SAVE=800

# detect number of available CPUs (should be modified for Nehalems with active SMT)
ncpus=`cat $PBS_NODEFILE | wc -l`

# STAR-CCM+ starts a master plus N $ncpus slaves; on Woody it’s o.k. to
# oversubscribe the nodes in this way (i.e. ncpus+1 processes on ncpus
# however, on Nehalem nodes (e.g. TinyBlue) it seems to be a very had idea
# to avoid oversubscription, uncomment the following line
## ncpus=$(($ncpus-1))

# check if enough licenses should be available
/apps/rrze/bin/check_lic.sh -c $CDLMD_LICENSE_FILE ccmpsuite 1 hpcdomains $(($ncpus-1))
. /apps/rrze/bin/check_autorequeue.sh

export MPIRUN_OPTIONS=”-v -prot”
# or with pinning: e.g.
## export MPIRUN_OPTIONS=”-v -prot -cpu_bind=v,rank,v”
## export MPIRUN_OPTIONS=”-v -prot -cpu_bind=v,MAP_CPU:0,1,2,3,4,5,6,7,v”

# if there are messages about “mpid: Not enough shared memory” you may try to set
# the maximum shared memory size in bytes by hand – but usually the message means
# that there is really not enough memory available; so forget about this option!
## export MPI_GLOBMEMSIZE=…

export MPI_REMSH=ssh

# automatically detect how much time this batch job requested and adjust the
# sleep attempt;
# make sure you defined the “stop file” within STAR-CCM+ accordingly
( sleep ` qstat -f $PBS_JOBID | awk -v t=$TIME4SAVE \
‘{if ( $0 ~ /Resource_List.walltime/ ) \
{ split($3,duration,”:”); \
print duration[1]*3600+duration[2]*60+duration[3]-t }}’ `; \
touch ABORT ) >& /dev/null &
export SLEEP_ID=$!

starccm+ -batch -np $ncpus -machinefile $PBS_NODEFILE -load myjob.sim

pkill -P $SLEEP_ID
[/shell]