Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things

Content

stopping STAR-CD at latest just before the wallclock time is exceeded

A similar approach to the one described for CFX in is also possible for STAR-CD as shown in the following snippet (Thanks to one of our users for the feedback!):
[shell]
#!/bin/bash -l
#PBS -l nodes=2:ppn=4
#PBS -l walltime=24:00:00
#PBS -N somename

# Change to the directory where qsub was made
cd $PBS_O_WORKDIR

### add the module of the STAR-CD version, e.g. 4.02
module add star-cd/4.02_64bit

# specify the time needed to write the result and info files, e.g. 900 seconds
export TIME4SAVE=900

#automatically detect how much time this job requested and
#adjust the sleep accordingly
( sleep ` qstat -f $PBS_JOBID | awk -v t=$TIME4SAVE \
‘{if ( $0 ~ /Resource_List.walltime/ ) \
{ split($3,duration,”:”); \
print duration[1]*3600+duration[2]*60+duration[3]-t }}’ `; \
star -abort ) >& /dev/null &
export SLEEP_ID=$!

# the normal STAR-CD start follows …
star -dp `cat $PBS_NODEFILE`

pkill -P $SLEEP_ID
[/shell]

Automatically requeuing of jobs if not enough licenses are available

A common problem with queuing systems and commercial software using floating licenses is that you cannot easily guarantee that the licenses you need are available when your job starts. Some queuing systems and schedulers can consider license usage – the solution at RRZE does not (at least not reliably).

A partial solution (although by far not optimal) is outlined below. With effectively two additional lines in your job script you can at least ensure that your job gets requeued if not enough licenses are available – and does not just abort. (The risk for race conditions which are not detected of course still exists, and you may have to wait again some time until compute resources are available for your new jobs … but better than only seeing the error message after the weekend …

[shell]
#!/bin/bash -l
#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -N myjob

# it is important that “bash” is executed on the first line above!
#
# check for 16 hpcdomains and 1 starpar license and automatically
# requeue the job if not enough licenses are available right now.
# This check is based on the situation right now – it may
# change just in the next second, thus, there is no guarantee
# that the license is still available in just a few moments.
# We do not checkout, borrow or reserve anything here!
# CHANGE license server and feature list according to your needs!
# instead of $CDLMD_LICENSE_FILE you can use the PORT@SERVER syntax
/apps/rrze/bin/check_lic.sh -c $CDLMD_LICENSE_FILE hpcdomains 16 starpar 1

# the next line must follow immediately after the check_lic.sh line
# with no commands in between!
# (the “.” at the beginning is also correct and important)
. /apps/rrze/bin/check_autorequeue.sh

# now continue with your normal tasks …
# if there were not enough licenses in the preliminary check,
# the script will not come until here but it got requeued.
[/shell]

This approach is not at all limited to STAR-CD and should work on Cluster32 and Woody.

 

ATTENTION: this approach does NOT work if license throttling is active, i.e. in cases where licenses are in principle available but the license server limits the number of licenses you or your group may get by using some MAX setting in the option file on the license server!

Most licenses at RRZE are throttled, thus, the check_lic.sh and check_autorqueue.sh scripts are of limited use only these days.

Compiling user subroutins for STAR-CD at RRZE

STAR-CD can be extended with user subroutines. To compile the user code, a compatible Fortran compiler is required. Unfortunately, CD adapco’s standard compiler for Linux still is Absoft for which no license is available. However, for most versions of STAR-CD for Linux x86_64 at least, also a PGI version is available. Thanks to the financial engagement of one of the main users’ group, a PGI license can now be used on the frontends of the Woody cluster (and also on sfront03). If you have user subroutines, please generally use one of the 64-bit PGI-STAR-CD versions (usually but not always with “pgi” in the module name), to compile the user code login to woody.rrze (or sfront03.rrze if you run your simulations e.g. on the opteron queue of Cluster32 (“Transtec cluster”), load the appropriate STAR-CD module and “pgi/6.2-2” and compile your user routines using star [-dp] -ufile. Now, you can submit your code as usual. Automatic compilation of the user subroutines from within a job file may or may not work. Thus, please compile them in advance before hand.

An important note: STAR-CD as of 3.2x does not work together with the latest PGI versions (7.x). Thus, you have to explicitly select version 6.2-2 of the PGI compiler; nobody tested yet which PGI versions are compatible with STAR-CD 4.0x … please drop a note if you are the first volunteer.

Attention: Interactive access to thor.rrze for compiling user code is no longer possible (and also no longer necessary). Use the woody login nodes or sfront03 instead.

The description in my previous article thus in principle is still valid.

STAR-CD 4.06 has just been installed. If the corresponding module is loaded, the appropriate PGI compiler module will be loaded automatically.

Existing star.reg files may cause problems: With STAR-CD 4.06 we just discovered a very strange behavior: if there is a star.reg file present, star -ufile tries to ssh to the first node listed in PNP_HOSTS (if present in star.reg and do the compilation there – which of course usually fails as users are not allowed to login to batch nodes without having a job running right now on them. The definition of a COMPILERHOST also is no solution as (1) star.reg is still evaluated and (2) STAR-CD tries to make a ssh connection to COMPILERHOST which works but then fails as there is no PGI module loaded.
To sum up: if you have to compile user subroutines, do this on the login nodes by calling star [-dp] -ufile but make sure that there is no star.reg file in the current directory. As star.reg does not contain critical information, it should be safe to just delete it if is is in the way.

CFX-11.0SP1 and Windows CCS

According to Ansys, CFX-11 is not supported on Windows CCS. According to Microsoft, running CFX-11 on Windows CCS is no problem at all …

When we first tried some months ago on our small Windows CCS test cluster, we suffered from very strange error messages complaining about wrong paths, etc. even if only the GUIs were started on the terminal server (login node). Thus, we gave up again quite soon as diving into all details did not seem to be worth the effort.

I now tried again with a fresh installation of ANSYS/CFX-11 on our Windows CCS headnode – and what a surprise, no error messages any more.

At least for the CFX solver itself, it seems not to be a problem if UNC paths are listed in ...\ANSYS Inc\v110\CFX\conf\hosts.ccl for the compute nodes – although the CFX documentation mentions that UNC paths do not work for the CFX solver.

The default settings in ...\ANSYS Inc\v110\CFX\cfxccs.pl are a bit strange – we do not have a \\headnode\cfxworkdir\ directory where everyone is allowed to write and it also seems to be a bad idea to always use just stdout and stderr for the output files …, thus, I locally made the following changes in cfxccs.pl on the shared directory of the CCS headnode:

  my $workingdirectory='//'.$hostname.'/ccsshare/'.$ENV{USERNAME};
  my $stdoutfile=$workingdirectory.'/stdout.%CCP_JOBID%.txt';
  my $stderrfile=$workingdirectory.'/stderr.%CCP_JOBID%.txt';

Now all output first of all goes into the user’s own shared directory, and second, the stdout/stderr files get the job-ID appended (and the suffix txt so that they are automatically opened with a text editor).

Initial tests look fine – let’s see what the users say.

Installation of OpenFOAM

As there was some interest in OpenFOAM (“The Open Source CFD Toolbox”), I started installing it on our Woody cluster – can’t be too difficult, I thought.

Unfortunately, the pre-compiled binaries did not work as we have to run SuSE SLES9SP3 on this cluster (owing to the HP SFS parallel file system) and SLES9SP3 does not contain the required versions of gcc, openssl and probably some more packages.

Well, compiling from sources should not be a problem and then we can link to our “supported” Intel MPI library. No problem, or? Well, unpacking the OpenFOAM sources on an NFS directory takes ages (no surprise – almost 44k file/directories get extracted), they use their own build system, … To put a long story short, I gave up with Intel compilers and Intel MPI for the moment – gcc and the provided Open-MPI are used for now. Compilation takes ages (again no surprise as the installation directory grows up to 1.1 GB) … and Java complains about missing com.sun.j3d.utils.* – ah, you have to install Java 3D in addition (why didn’t the documentation mention this?) …

O.k. first compilation done (in 32-bit with integrated Open-MPI and probably neither Infiniband support nor PBS/Torque integration included). Now let’s build module files to integrate OpenFOAM in the environment loading scheme. Requires quite some work as >>30 environment variables have to be set or modified. (Thanks to LRZ for the work they already made on HLRB2 – That was a good starting point although it did not fit completely our needs.) But at least now foamInstallationTest does not report any error!

The first (solved) problem was that the nsd daemon of OpenFOAM tries to create some sort of lock file (ns.ref) in $WM_PROJECT_DIR/.OpenFOAM-1.4.1/apps/FoamX – this directory of course is on the NFS server and not writable by users. Copying the FoamX subdirectory to the user’s directory and adjusting $FOAMX_CONFIG solved the issue. Any better solution?

A 64-bit compilation now also finished in around 4h (again with OpenFOAM defaults only). However, the 32- and 64-bit version lack the integration of ParaView, thus, some commands like paraFoam currently fail. Obviously, ParaView sources are required while compiling, too.

http://www.tfd.chalmers.se/~hani/kurser/OF_phD_2007/downloadCompileAndRun.pdf seems to contain good guidlines for compiling and getting paraFoam et al. working … But just copying the original binary of libPVFoamReader.so did not do the trick for me.

On the other hand, adding PBS/Troque and Infiniband support to the provided Open-MPI seems to be easy; I now only added --with-tm=$OUR_TORQUE --with-openib=$OUR_OFED to $WM_PROJECT_DIR/src/Allwmake and recompiled just Open-MPI. Torque of course has to be compiled with support for position independent code or as shared library (cf. http://www.open-mpi.de/faq/?category=building#build-rte-tm). As we only have 64-bit OFED and Torque libraries, of course only the 64-bit build of OpenFOAM will have built-in support for them.

Let’s see if some users really will use it (and what they complain about).

More problems? Probably yes …

HPC and CFD courses in spring 2008

There will be a number of courses on HPC and CFD topics during the next months:

  • Tutorial: Programming with Fortran 95/2003: Object orientation and design patterns, February 4th-6th, LRZ-Munich (video transmission to RRZE possible if there is enough interest). For more details check http://www.lrz-muenchen.de/services/compute/courses/#OOFortran
  • Workshop: Performance Analysis of parallel programms with VAMPIR, February 7th, LRZ-Munich (video transmission to RRZE possible if there is enough interest). For more details check http://www.lrz-muenchen.de/services/compute/courses/#Vampir
  • NUMET 2008, 10.-13. März 2008, LSTM-Erlangen. For more details check http://www.lstm.uni-erlangen.de/numet2008/
  • Introductory course on High Performance Computing, March 17th-20th, RRZE. For more details check http://www.rrze.uni-erlangen.de/news/meldungen/meldung.shtml/9483

Also HLRS is organizing a couple of its annual courses and workshops during the next months: http://www.hlrs.de/news-events/events/

Microsoft campus day with a special focus on Windows-Compute-Cluster (CCS)

Microsoft campus day with a special focus on Windows-Compute-Cluster (CCS)
Since some time, the HPC group at RRZE operates a small Windows Compute Cluster (cf. http://www.rrze.uni-erlangen.de/dienste/arbeiten-rechnen/hpc/systeme/windows-cluster.shtml)

Up-to-now only very little interest among our “normal” HPC users was observed. Therefore, a Microsoft campus day with special focus on Windows CCS is organized on January 17th at the Faculty of Economics, see
http://www.rrze.uni-erlangen.de/news/meldungen/meldung.shtml/9484 for more details

Sun Studio 12 and open-mpi

The Sun Studio has a long tradition and provides a lot of tools beyond just the compilers. It’s available free of charge not only for Solaris (Sparc and x86/x64) but also for Linux (x64). To give it a try, I just installed it on one of our systems. Unfortunately, Sun does not (yet) provide the Sun ClusterTools for Linux. Thus, I had to compile a MPI library by my own.

As we also do not have any experience with open-mpi either, I gave that at the same time a try. Unfortunately, open-mpi (1.2.3) requires some patching to get compiled with the Sun Studio 12 compilers (as documented in the open-mpi FAQ). But besides that there were no problems to get the PBS/torque TM-interface and Open Fabrics Infiniband included.

Next, how to run MPI programs from within batch jobs? We are used to Pete Wyckoff’s mpiexec which we extended with a quite powerful wrapper to allow pinning of processes (using PLPA), specification of round robin or block assignment of processes, partially filled nodes, etc.. Open-mpi comes with its own PBM TM-interface, thus, the next steps will be to figure out how all functionality can be provided with open-mpi’s mpirun. So far, I did not find an option to use nodes only partially. But at least there is --bynode and --byslot (default) and I quickly got some MPI PingPong numbers between nodes – the numbers look reasonable. …

A promising start for more explorations in the future.

If you are running with many MPI processes on large SMP systems, e.g. SUN’s Niagara2 systems, you might need to increase the “open files” limit significantly, e.g. by issuing ulimit -n 32768 otherwise the MPI startup may fail with messages like unable to create shared memory mapping.