Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things

Content

Why cfx5solve from Ansys-13.0 fails on SuSE SLES11SP2 …

Recently, the operating system of one of RRZE’s HPC clusters was upgraded from SuSE SLES10 SP4 to SuSE SLES11 SP2 … one of the few things which broke due to the OS upgrade is Ansys/CFX-13.0. cfx5solve now aborts with

ccl2flow: * command language error *
Message: getChildList: unable to find the requested path
Context: returned by cclApi call

As one can expect, Ansys does not support running Ansys-13.0 on SuSE SLES11 or SLES11 SP2. There are also lots of reports on this error for different unsupported OS versions in the CFX forum at cfd-online but no explanations or workarounds yet.

So, where does the problem come from? A long story starts …

First guess: SuSE SLES11 SP2 runs a 3.0 kernel. Thus, there might be some script which does not correctly parse the uname or so. However, the problem persists if cfx5solve is run using uname26 (or the equivalent long setarch variant). On the other hand, the problem does not occur if e.g. a CentOS-5 chroot is started on the SLES11 SP2 kernel, i.e. still the same kernel but old user space. This clearly indicates that it is no kernel issue but some library or tool problem.

Next guess: Perl comes bundled with Ansys/CFX but it might be some other command line tool from the Linux distribution which is used by cfx5solve, e.g. sed and friends or some changed bash behavior. Using strace on cfx5solve reveals several calls of such tools. But actually, none of them is problematic.

Thus, it must be a library issue: Ansys/CFX comes with most of the libraries it needs bundled but there is always the glibc, i.e. /lib64/ld-linux-x86-64.so.2, /lib64/libc.so.6, etc. SuSE SLES10 used glibc-2.4, RHEL5 uses glibc-2.5 but SLES11 SP2 uses glibc-2.11.3

The glibc cannot be overwritten using LD_LIBRARY_PATH as any another library. But there are ways to do it anyway …

The error message suggests that ccl2flow.exe is causing the problems. So, let’s run that with an old glibc version. As cfx5solve allows specifying a custom ccl2flow binary we can use a simple shell script to call the actual ccl2flow.exe using the loader and glibc libraries from the CentOS5 glibc-2.5. Nothing changes; still the very same getChildList error message in the out file. Does that mean that ccl2flow.exe is not the bad guy?

Interlude: Let’s see how ccl2flow.exe is called. The shell wrapper for ccl2flow was already there, thus, let’s add some echo statements to the command line arguments and a sleep statement to inspect the working directory. Et vola. On a good system, a quite long ccl file has just been created before ccl2flow is called; however, on a bad system running the new OS the ccl file is almost empty. Thus, we should not blame ccl2flow.exe but what happens before. Well, before there is just the Ansys supplied perl running.

Let’s have a closer look at the perl script: Understanding what the cfx5solve Perl script does seems to be impossible. Even if the Perl script is traced on a good and bad system there are no real insights. At some point, the bad system does not return an object while the other does. Thus, let’s run perl using the old glibc version. That’s a little bit more tricky as cfx5solve is not a binary but a shell script which calls another shell script before finally calling an Ansys-supplied perl binary. But one can also manage these additional difficulties. Et vola, the error message disappeared. What’s going on? Perl is running fine but producing different results depending on the glibc version.

Interlude Ansys/CFX-14.0: This version if officially only supported on SuSE SLES11 but not SLES11 SP2 if I got it correctly. But it runs fine on SLES11 SP2, too. What Perl version do they use? Exactly the same version, even the very same binary (i.e. both binaries have the same checksum!). Thus, it is not the Perl itself but some CFX-specific library it dynamically loads.

End of the story? Not yet but Almost. Spending already so much time on the problem I finally wanted to know which glibc versions are good or evil. I already knew Redhat’s glibc-2.5 is good and SuSE’s glibc-2.11.3 is evil. Thus, let’s try the versions in between using the official sources from ftp.gnu.org/gnu/glibc. Versions <2.10 or so require a fix for the configure script to recognice a modern as or ld as good version. A few versions do not compile properly at all on my system. But there is no bad version, even with 2.11.3 there is no CFX error. Only starting from glibc-2.12.1 on there is the well-known ccl2flow error. Not really surprising. SuSE and other Linux distributors have long lists of patches they apply, including back-ports from newer releases. There are almost 100 SuSE patches included in their version of glibc-2.11.3-17.39.1; no chance to see what they are doing.

My next guess is that the problem must be a commit between 2.11.3 and 2.12.1 of the official glibc versions. GNU proves a Git repository and git bisect is your friend. This leads to commit f89d2f30 from Dec. 2009: Enable multiarch whenever possible. This commit did not change any actual code but only the default configuration parameters. That means, the code causing the fault must be in the sources already much before. It only debuted once multi-arch was switched on in 2.12.1 of the vanilla version or earlier in the SuSE version (the spec file contains an --enable-multi-arch line; proved).

Going back in history, it finally turns out that glibc commit ab6a873f from Jun 2009 (SSSE3 strcpy/stpcpy for x86-64) is responsible for the problems leading to the failing ccl2flow.

Unfortunately, it is not possible to see if the most recent glibc versions still cause problems as cfx5solve already aborts earlier with some error message (Can’t call method “numstring” on an undefined value).

It is also not clear whether it is a glibc error, a problem in one of the CFX library or if it just because of the tools used when Ansys-13.0 was compiled.

End of the story: If you a willing to take the risk of getting wrong results, you may make v130/CFX/tools/perl-5.8.0-1/bin/Linux-x86_64/perl use an older glibc version (or one compiled without multi-arch support) and thus avoid the ccl2flow error. But who knows what else fails visibly or behind the scene. There is a unknown risk of wrong results even if cfx5solve now runs in principle on SuSE SLES11 SP2.

I fully understand that users do not want to switch versions within a running project. Thus, it is really a pity that ISVs force users (and sys admins) to run very old OS versions. SuSE SLES 10 was released in 2006 and will reach end of general support in July 2013; SLES11 was released in March 2009 while Ansys13 was released only in autumn 2010. And we still shall stick to SLES10? It’s time to increase the pressure on ISVs or to start developing in-house codes again.

STAR-CCM+ fails with “mpid: Not enough shared memory”

If STAR-CCM+ fails on large shared memory nodes with the message “mpid: Not enough shared memory”, your sysadmin might need to increase the kernel limits for SHMMAX (maximum size of shared memory segment in bytes), i.e. sysctl -w kernel.shmmax=.... Especially, the Ubuntu/Debian default of 32 MB seems to be too small even for 2-socket nodes with 8-core AMD Opteron processors, i.e. 16 cores/node …

Recipe for building OpenFOAM-1.7.1 with Intel Compilers and Intel MPI

Compared with other software, installing OpenFOAM is (still) a nightmare. They use their very own build system, there are tons of environment variables to set, etc. But it seems that users in academia and industry accept OpenFOAM nevertheless. For release 1.7.1, I took the time to create a little receipt (in some parts very specifically tailored to RRZE’s installation of software packages) to more or less automatically build OpenFOAM and some accompanying Third Party packages from scratch using the Intel Compilers (icc/icpc) and Intel MPI instead of Gcc and Open MPI (only Qt and Paraview are still built using gcc). The script is provided as-is without any guarantee that it works elsewhere and of course also without any support. The script assumes that the required source code packages have already been downloaded. Where necessary, the unpacked sources are patched and the compilation commands are executed. Finally, two new tar balls are created which contain the required “output” for a clean binary installation, i.e. intermediate output files (e.g. *.dep) are not included …

Compilation takes ages, but that’s not really surprising. Only extracting the tar balls with the sources amounts to 1.6 GB in almost 45k files/directories. After compilation (although neither Open MPI nor Gcc are built) the size is increased to 6.5 GB or 120k files. If all intermediate compilation files are removed, there are still about 1 GB or 30k files/directories remaining in my “clean installation” (with only the Qt/ParaView libraries/binaries in the ThirdParty tree).

RRZE users find OpenFOAM-1.7.1 as module on Woody and TinyBlue. The binaries used for Woody and TinyBlue are slightly different as both were natively compiled on SuSE SLES 10SP3 and Ubuntu 8.04, respectively. The main difference should only be in the Qt/Paraview part as SLES10 and Ubuntu 8.04 come with different Python versions. ParaView should also be compiled with MPI support.

Note (2012-06-08): to be able to compile src/finiteVolume/fields/fvPatchFields/constraint/wedge/wedgeFvPatchScalarField.C with recent versions of the Intel compiler, one has to patch this file to avoid an no instance of overloaded function “Foam:operator==” matches the argument list error message; cf. http://www.cfd-online.com/Forums/openfoam-installation/101961-compiling-2-1-0-rhel6-2-icc.html and https://github.com/OpenFOAM/OpenFOAM-2.1.x/commit/8cf1d398d16551c4931d20d9fc3e42957d0f93ca. These links are for OF-2.1.x but the fix works for OF-1.7.1 as well.

local OpenFOAM mailing list

OpenFOAM is a widely used open-source software for computational fluid dynamics (CFD). There is also a growing number of groups on our campus which use or at least give OpenFOAM a try. I never applied OpenFOAM for CFD simulations myself – I only spent lots of hours installing it on RRZE’s clusters. But from what I heard from actual users, documentation seems to be rather poor resulting in a slow learning curve. To facilitate and stimulate a coordinated communication and self-help of the different OpenFOAM users and groups at the University of Erlangen-Nuremberg, a local mailing list has been set up. OpenFOAM users from outside of the University of Erlangen-Nuremberg are also welcome if they give a substantial contribution – but keep in mind that this local mailing list is not an official OpenFOAM support forum.

The subscription policy for the mailing list is “open”, i.e. everyone can directly subscribe/unsubscribe. Posts to the mailing list are only allowed from registered users (i.e. from the email address used for subscription) – all other messages require approval by the moderator to prevent spam.

For further information and (un)subscription, please visit the webpage of the rrze-openfoam-user mailing list.

Installation of OpenFOAM-1.5 on Woody

Using pre-built binaries

Does not work as SuSE SLES10SP1 is too different …; one very strange thing is that the gcc-4.3.1 included in the ThridParty packages does work in 32-bit but complains about an incompatible library in 64-bit mode although the library is a correct 64-bit *.so

Building from sources using the Intel Compiler (10.1)

Fails due to problems with C++ templates, etc.

Building gcc-4.3.x manually

Has too many dependencies to quickly do it (MPFR and GNUMP – the SuSE SLES10SP1 versions are too old)

Building from sources using gcc-4.2.2

Requires a patch for autoRefineDriver.C to avoid fatal “invalid conversion” error message; cf. http://openfoam.cfd-online.com/cgi-bin/forum/show.cgi?tpc=126&post=24786)

Some general tricks

  • use WM_COMPILER_INST=System to specify that the system’s gcc should be used and not one which is part of OpenFOAM
  • use WM_NCOMPPROCS to specify the number of CPUs on which make may run concurrently (i.e. the value for make’s -j parameter)
  • at least for building ParaView, cmake (and of course Qt-4.3.x) must be available; it is likely that PV3FoamReader must be compiled afterwards (cf. OF-Readme) and it also depends on cmake
  • when building ParaView, CMAKE_HOME and W_COMPILER_DIR must be set as some sed commands fail otherwise

Running STAR-CCM+ jobs on Woody

Running STAR-CCM+ jobs on Woody

We now have a first user who is using STAR-CCM+ in parallel on the Woody cluster. Starting jobs in batch mode seems to be quite easy. As STAR-CCM+ internally uses HP-MPI, Infiniband should automatically be used correctly, too (although I did not explicitly verify this yet).

Here is what this user currently uses (again no idea if automatic stopping actually works with STAR-CCM+, thus, there might be room for improvements):

#!/bin/bash -l
#PBS -l nodes=2:ppn=4
#PBS -l walltime=24:00:00
#PBS -N some-jobname

cd  $PBS_O_WORKDIR

module add star-ccm+/3.04.008

# specify the time you want to have to save results, etc.
export TIME4SAVE=800

# detect number of available CPUs (should be modified for Nehalems with active SMT)
ncpus=`cat $PBS_NODEFILE | wc -l`

# STAR-CCM+ starts a master plus N $ncpus slaves; on Woody it's o.k. to
# oversubscribe the nodes in this way (i.e. ncpus+1 processes on ncpus
# however, on Nehalem nodes (e.g. TinyBlue) it seems to be a very had idea
# to avoid oversubscription, uncomment the following line
## ncpus=$(($ncpus-1))

# check if enough licenses should be available
/apps/rrze/bin/check_lic.sh -c $CDLMD_LICENSE_FILE ccmpsuite 1 hpcdomains $(($ncpus-1))
. /apps/rrze/bin/check_autorequeue.sh

export MPIRUN_OPTIONS="-v -prot"
# or with pinning: e.g.
## export MPIRUN_OPTIONS="-v -prot -cpu_bind=v,rank,v"
## export MPIRUN_OPTIONS="-v -prot -cpu_bind=v,MAP_CPU:0,1,2,3,4,5,6,7,v"

# if there are messages about "mpid: Not enough shared memory" you may try to set
# the maximum shared memory size in bytes by hand - but usually the message means
# that there is really not enough memory available; so forget about this option!
## export MPI_GLOBMEMSIZE=...

export MPI_REMSH=ssh

# automatically detect how much time this batch job requested and adjust the
# sleep attempt;
# make sure you defined the "stop file" within STAR-CCM+ accordingly
( sleep ` qstat -f $PBS_JOBID | awk -v t=$TIME4SAVE                   \
    '{if ( $0 ~ /Resource_List.walltime/ )                            \
        { split($3,duration,":");                                     \
          print duration[1]*3600+duration[2]*60+duration[3]-t }}' `;  \
 touch ABORT ) >& /dev/null  &
export SLEEP_ID=$!

starccm+ -batch -np $ncpus -machinefile $PBS_NODEFILE -load myjob.sim

pkill -P $SLEEP_ID

stopping STAR-CD at latest just before the wallclock time is exceeded

A similar approach to the one described for CFX in is also possible for STAR-CD as shown in the following snippet (Thanks to one of our users for the feedback!):

#!/bin/bash -l
#PBS -l nodes=2:ppn=4
#PBS -l walltime=24:00:00
#PBS -N somename

#  Change to the directory where qsub was made
cd $PBS_O_WORKDIR

### add the module of the STAR-CD version, e.g. 4.02
module add star-cd/4.02_64bit

# specify the time needed to write the result and info files, e.g. 900 seconds
export TIME4SAVE=900

#automatically detect how much time this job requested and
#adjust the sleep accordingly
( sleep ` qstat -f $PBS_JOBID | awk -v t=$TIME4SAVE                   \
    '{if ( $0 ~ /Resource_List.walltime/ )                            \
        { split($3,duration,":");                                     \
          print duration[1]*3600+duration[2]*60+duration[3]-t }}' `;  \
  star -abort ) >& /dev/null  &
export SLEEP_ID=$!

# the normal STAR-CD start follows ...
star -dp `cat  $PBS_NODEFILE`

pkill -P $SLEEP_ID

Automatically requeuing of jobs if not enough licenses are available

A common problem with queuing systems and commercial software using floating licenses is that you cannot easily guarantee that the licenses you need are available when your job starts. Some queuing systems and schedulers can consider license usage – the solution at RRZE does not (at least not reliably).

A partial solution (although by far not optimal) is outlined below. With effectively two additional lines in your job script you can at least ensure that your job gets requeued if not enough licenses are available – and does not just abort. (The risk for race conditions which are not detected of course still exists, and you may have to wait again some time until compute resources are available for your new jobs … but better than only seeing the error message after the weekend …

#!/bin/bash -l
#PBS -l nodes=1:ppn=1
#PBS -l walltime=12:00:00
#PBS -N myjob

# it is important that "bash" is executed on the first line above!
#
# check for 16 hpcdomains and 1 starpar license and automatically
# requeue the job if not enough licenses are available right now.
# This check is based on the situation right now - it may
# change just in the next second, thus, there is no guarantee
# that the license is still available in just a few moments.
# We do not checkout, borrow or reserve anything here!
# CHANGE license server and feature list according to your needs!
# instead of $CDLMD_LICENSE_FILE you can use the PORT@SERVER syntax
/apps/rrze/bin/check_lic.sh -c $CDLMD_LICENSE_FILE hpcdomains 16 starpar 1

# the next line must follow immediately after the check_lic.sh line
# with no commands in between!
# (the "." at the beginning is also correct and important)
. /apps/rrze/bin/check_autorequeue.sh

# now continue with your normal tasks ...
# if there were not enough licenses in the preliminary check,
# the script will not come until here but it got requeued.

This approach is not at all limited to STAR-CD and should work on Cluster32 and Woody.

 

ATTENTION: this approach does NOT work if license throttling is active, i.e. in cases where licenses are in principle available but the license server limits the number of licenses you or your group may get by using some MAX setting in the option file on the license server!

Most licenses at RRZE are throttled, thus, the check_lic.sh and check_autorqueue.sh scripts are of limited use only these days.

Compiling user subroutins for STAR-CD at RRZE

STAR-CD can be extended with user subroutines. To compile the user code, a compatible Fortran compiler is required. Unfortunately, CD adapco’s standard compiler for Linux still is Absoft for which no license is available. However, for most versions of STAR-CD for Linux x86_64 at least, also a PGI version is available. Thanks to the financial engagement of one of the main users’ group, a PGI license can now be used on the frontends of the Woody cluster (and also on sfront03). If you have user subroutines, please generally use one of the 64-bit PGI-STAR-CD versions (usually but not always with “pgi” in the module name), to compile the user code login to woody.rrze (or sfront03.rrze if you run your simulations e.g. on the opteron queue of Cluster32 (“Transtec cluster”), load the appropriate STAR-CD module and “pgi/6.2-2” and compile your user routines using star [-dp] -ufile. Now, you can submit your code as usual. Automatic compilation of the user subroutines from within a job file may or may not work. Thus, please compile them in advance before hand.

An important note: STAR-CD as of 3.2x does not work together with the latest PGI versions (7.x). Thus, you have to explicitly select version 6.2-2 of the PGI compiler; nobody tested yet which PGI versions are compatible with STAR-CD 4.0x … please drop a note if you are the first volunteer.

Attention: Interactive access to thor.rrze for compiling user code is no longer possible (and also no longer necessary). Use the woody login nodes or sfront03 instead.

The description in my previous article thus in principle is still valid.

STAR-CD 4.06 has just been installed. If the corresponding module is loaded, the appropriate PGI compiler module will be loaded automatically.

Existing star.reg files may cause problems: With STAR-CD 4.06 we just discovered a very strange behavior: if there is a star.reg file present, star -ufile tries to ssh to the first node listed in PNP_HOSTS (if present in star.reg and do the compilation there – which of course usually fails as users are not allowed to login to batch nodes without having a job running right now on them. The definition of a COMPILERHOST also is no solution as (1) star.reg is still evaluated and (2) STAR-CD tries to make a ssh connection to COMPILERHOST which works but then fails as there is no PGI module loaded.
To sum up: if you have to compile user subroutines, do this on the login nodes by calling star [-dp] -ufile but make sure that there is no star.reg file in the current directory. As star.reg does not contain critical information, it should be safe to just delete it if is is in the way.