Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things

Content

Installation of OpenFOAM

As there was some interest in OpenFOAM (“The Open Source CFD Toolbox”), I started installing it on our Woody cluster – can’t be too difficult, I thought.

Unfortunately, the pre-compiled binaries did not work as we have to run SuSE SLES9SP3 on this cluster (owing to the HP SFS parallel file system) and SLES9SP3 does not contain the required versions of gcc, openssl and probably some more packages.

Well, compiling from sources should not be a problem and then we can link to our “supported” Intel MPI library. No problem, or? Well, unpacking the OpenFOAM sources on an NFS directory takes ages (no surprise – almost 44k file/directories get extracted), they use their own build system, … To put a long story short, I gave up with Intel compilers and Intel MPI for the moment – gcc and the provided Open-MPI are used for now. Compilation takes ages (again no surprise as the installation directory grows up to 1.1 GB) … and Java complains about missing com.sun.j3d.utils.* – ah, you have to install Java 3D in addition (why didn’t the documentation mention this?) …

O.k. first compilation done (in 32-bit with integrated Open-MPI and probably neither Infiniband support nor PBS/Torque integration included). Now let’s build module files to integrate OpenFOAM in the environment loading scheme. Requires quite some work as >>30 environment variables have to be set or modified. (Thanks to LRZ for the work they already made on HLRB2 – That was a good starting point although it did not fit completely our needs.) But at least now foamInstallationTest does not report any error!

The first (solved) problem was that the nsd daemon of OpenFOAM tries to create some sort of lock file (ns.ref) in $WM_PROJECT_DIR/.OpenFOAM-1.4.1/apps/FoamX – this directory of course is on the NFS server and not writable by users. Copying the FoamX subdirectory to the user’s directory and adjusting $FOAMX_CONFIG solved the issue. Any better solution?

A 64-bit compilation now also finished in around 4h (again with OpenFOAM defaults only). However, the 32- and 64-bit version lack the integration of ParaView, thus, some commands like paraFoam currently fail. Obviously, ParaView sources are required while compiling, too.

http://www.tfd.chalmers.se/~hani/kurser/OF_phD_2007/downloadCompileAndRun.pdf seems to contain good guidlines for compiling and getting paraFoam et al. working … But just copying the original binary of libPVFoamReader.so did not do the trick for me.

On the other hand, adding PBS/Troque and Infiniband support to the provided Open-MPI seems to be easy; I now only added --with-tm=$OUR_TORQUE --with-openib=$OUR_OFED to $WM_PROJECT_DIR/src/Allwmake and recompiled just Open-MPI. Torque of course has to be compiled with support for position independent code or as shared library (cf. http://www.open-mpi.de/faq/?category=building#build-rte-tm). As we only have 64-bit OFED and Torque libraries, of course only the 64-bit build of OpenFOAM will have built-in support for them.

Let’s see if some users really will use it (and what they complain about).

More problems? Probably yes …

Common license pool for STAR-CD probably continues for next three years

It took quite long until a solution for prolonging the joint license pool with an increased number of licenses for parallel runs could be found. But everything seems to be solved now for the next three years.

Further chairs can join at any time – of course, the license may only be used for education and scientific research (and not for industrial research or projects). If additional groups join, this will not increase the total costs (unsless additional license features will be required) but reduce the amount the individual groups have to pay annually …

Also check some notes if you use STAR-CD on RRZE’s new parallel computer Woody.

Running STAR-CD over Infiniband

STAR-CD 4.02 works out of the box; there are currently some warnings ERROR: ld.so: object 'libmpi.so' from LD_PRELOAD cannot be preloaded: ignored. As these messages seem to be uncritical, I’m not sure if I’ll further debug their cause.

STAR-CD 3.26 also works out of the box.

User subroutines are not yet tested; some additional steps will be required to get them compiled as the required PGI compiler is not installed locally…

In first tests, star -chkpnt failed with the message TAR checkpoint failed due to invalid "star.pst" file. or TAR checkpoint failed due to invalid "star.ccm" file.

=====================================================================

Using STAR-CD in principle works as follows (access to the STAR-CD module is restricted by ACLs)

  • prepare your input files on your local machine; the RRZE systems are not supposed for interactive work.
    If you have to use the RRZE systems for some reason for pre/postprocessing, do not start prostar, etc. on the login nodes but submit an interactive batch job using qsub -I -X -lnodes=1:ppn=4,walltime=1:00:00!
  • transfer all input files to the local filesystem on the Woody cluster using SSH (scp/sftp), i.e. copy them to /home/woody1/.../.../...
  • Use a batch file as follows:
    [shell]
    #!/bin/bash -l
    # DO NOT USE #!/bin/sh in the line above as module would not work; also the “-l” is required!
    #PBS -l nodes=2:ppn=4
    #PBS -l walltime=24:00:00
    #PBS -N STARCD-woody
    #… any other PBS option you like

    # let’s go to the directory where the script was submitted
    cd $PBS_O_WORKDIR

    # load the STAR-CD module; either “star-cd/3.26_64bit” or “star-cd/4.02_64bit”
    module add star-cd/3.26_64bit

    # here we go
    star -dp `cat $PBS_NODEFILE`
    [/shell]

  • submit your job to the PBS batch system using qsub
  • wait until the job finished
  • transfer the required result files to your local PC, analyze the results locally (using your fast graphics card)
  • delete all files you no longer need from the RRZE system as disk space is still valuable

Some more details on “ERROR: ld.so: object ‘libmpi.so’ from LD_PRELOAD cannot be preloaded: ignored” message: Looking into the script which is actually used to call the STAR-CD binary, I can guess where the message might come from. They use something like $HPMPI/bin/mpirun ... -e LD_PRELOAD=libmpi$PNP_DSO ... -f .starboot.mpi, i.e. they do not specify a path for the library to be preloaded. I’m not sure what the current policy of ld.so from glibc is (does it look at the current LD_LIBRARY_PATH – if set) or does it only look at “secure” (predefined) directories. If the latter is the case, star of course cannot find the library …

Problems with STAR-CD-4.02 (64-bit) and Infibiband: Running the PGI variant of STAR-CD-4.02 (64-bit) over Infiniband (i.e. using HP-MPI) currently may fail on our new Woody-cluster as well as on the Infiniband partition of the Transtec-cluster. The observations (currently based on just a signle testcase) are as follows:

  • STAR-CD-4.02/PGI using HP-MPI/VPAI runs fine for a few iterations but then suddenly stops to consume CPU time on most of the nodes.
  • STAR-CD-4.02/PGI using HP-MPI/TCP or mpich runs fine.
  • STAR-CD-4.02/Absoft runs fine even with HP-MPI/VAPI!

Further tests are currently on their way …

For the moment, module add star-cd/4.02_64bit; star -mpi=hp -mppflags="-v -prot -TCP" is the recommended way of starting STAR-CD-4.02.

Sometimes also problems of STAR-CD 3.26 with Infiniband: According to a user report, also STAR-CD 3.26 with HP-MPI over Infiniband has the problem that it suddenly stops to run. It seems the the AMG-preconditioner is the reason for the problems.

So, check if Infiniband runs fine for your cases, if not (and only if not) add -mpi=hp -mppflags="-v -prot -TCP"

Upgraded IB stack seems to solve Infiniband problems: Updating from the Voltaire ibhost-stack to the Voltaire GridStack 4.1 (which is OFED-1.1 based) seems to have solved the issue with hanging STAR-CD processes. Please try to run without the argument -TCP!

As a technical note: the HP-MPI version which comes with STAR-CD 4.02 or 3.26 is too old to work with OFED; thus, the latest HP-MPI (i.e. 2.02.05) has been installed on Woody. The module files for STAR-CD have been adapted to automatically use this updated version. Your output (if -v -prot is used) should now show IBV instead of VAPI if the high speed network is used.

Using IPoverIB as fallback: If native Infiniband does not work even with the upgraded IB stack (e.g. due to a bug in connection with AMG), try IP-over-IB by using -mppflags="-v -prot -TCP -netaddr 10.188.84.0/255.255.254.0".

Another option available in HP-MPI (which is completely unrelated to the communication network but I’m to lazy to create an other thread right now) is the ability to pin processes to specific CPUs of a node. On woody, -mppflags="... -cpu_bind=v,map_cpu:0,1,2,3 ..." is the right choice if you run with 4 CPUs per node; if you have huge memory requirements and thus only use every second core, the correct option would be -mppflags="... -cpu_bind=v,map_cpu:0,1 ...".

STAR-CD 4.0 available

Since resently, STAR-CD 4.0 is available for x86 and x86_64. Unfortunately, the currently builds again require the ABSOFT compiler to include user subroutines 🙂 Further builds and support or IA64 and MS Windows hopefully apprear in the following months …

STAR-CD 4.0 has already installed on Cluster32. All STAR-CD users are encouraged to test this new version. Iy you are using user subroutines, some changes are probably required as CD-adapco moved from Fortran77 to Fortran90 and eliminated one common data structure. Check the documentation for more details.

For those, running STAR-CD on their local systems, the installation files are available at the usual place.

Using STAR-CD with USER-routines on Cluster32

Using STAR-CD with USER-routines on Cluster32 is a little bit tricky due to the following: on the compute node only a limited environment is available. In particular there are no compileres and linkers available. Therefore, it is necessary to compile the user routines before submitting the job to PBS (but after all input files are available).

Therefore, proceed as follow:

  1. prepare all input files and copy them to the directory where you plan to run the simulation
  2. exectue the following commands in an interactive shell (i.e. login shell):
            module add star-cd/XXXX
            star [-dp] -ufile
    

    The optional option -dp is necessary if you plan to use double precission.
    The star -ufile line compiles your user routines on the login node where a complete development environment is available.

  3. Now you can submit your job and the simulation can run on the compute nodes without need for a compiler.

STAR-CD 3.24 Update 025 on SGI Altix

For some reasons, the update 025 of STAR-CD 3.24 on our SGI Altix system tried to use Intel MPI by default instead of SGI MPT. After setting the environment variable STARFLAGS=-mpi=sgi, SGI MPT is used again also for this version.

STAR-CD 3.24 Updates

Von STAR-CD 3.24 für Linux (x86, x86_64, IA64) gibt es inzwischen Updates auf 3.24.006. Wenn Bedarf besteht, kann das Update auf den RRZE-Systemen installiert werden (dort ist z.Z. 3.24.000 verfügbar) bzw. den am Campus-Vertrag teilnehmenden Instituten zur Verfügung gestellt werden.

Ob es STAR-CD 3.24 inzwischen auch für Windows gibt st unklar …

Ebenso ist nicht klar, ob es “alternative” Linux Builds mit dem Intel ifc Compiler für die aktuelle Release gibt …

Demnächst hoffentlich mehr zu diesen beiden Themen.