Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things

Content

MpCCI 3.0.5 released — but not yet working at RRZE

A new version of MpCCI (3.0.5) has been released. See www.mpcci.org for more details. Many things changed — in particular, the MpCCI library itself should be much more encapsulated so that it should be completely independent of the MPI version used within the application program.

However, owing to licensing problems, the new version could not yet be tested on the HPC systems of RRZE … Well, the solution was rather trivial: the license file did not contain the FQN of the server and therefore checking out or even testing the license from a remote host did not work if the FQN was used there, i.e. the solution was just to replace the shortend hostname by the full hostname (including domain) or also possible by the IP address. After restarting the license server, the licensensing problems disappeard

In the meantime (early August 2006), MpCCI 3.0.5 is also available on the LRZ machines …

However, there seem to be still some problems on the application side — probably not a big surprise as the internal structure of MpCCI 3.0.5 and the API have completely changed …

… especially on SGI Altix systems (or other machines where CPUsets are used), a start mechanism based on ssh cannot be used as CPUsets would be escaped and “batch” processes start running on interactive/boot CPUsets …

MpCCI auf SGI Altix (3)

MpCCI auf SGI Altix (3)

Tests to use the client-server mode with SGI MPT … — a never ending tragedy:

  • link your application with: ccilink -client -nompilink ..... -lmpi
  • produce the procgroup file with ccirun -server -norun ex2.cci
  • run your simulation:
    /opt/MpCCI/mpiexec/bin/mpiexec -server &
    /opt/MpCCI/mpiexec/bin/mpiexec -config=ccirun_server.procgroup >& s & x1 & x2 < /dev/null
    sleep 10
    
  • The number of CPUs requested must be at least as large as the number of server processes, i.e. those started with mpiexec.

    If you use mpich instead of MPT all processes have to be started with mpiexec. As a consequence, the number of requested CPUs must be equal to the total number of processes. The PBS-TM interface used by mpiexec does not allow to overbook CPUs.

    … after many hours of testing: IT SEEMS THAT MpCCI WITH SGI-MPT DOES *NOT* WORK RELIABLY AT ALL … mpi_comm_rank==0 on all processes 🙂 despite using the correct mpif.h files and mpiruns for the server and client applications.

    My current conclusions:

    • MpCCI does not support SGI MPT natively
    • using mpich on SGI Altix for all communications is NO option as benchmarks showed that CFD applications are slower by a factor of 2 or more when using mpich instead of MPT
    • using MpCCI in client-server mode also seems not to work (see above)

    That means, MpCCI is not at all usable on SGI Altix. Sorry for those who rely on it.

SGI Altix extension

Recently, the SGI Altix at RRZE has been extended. We now have a batch-only system altix-batch (an SGI Altix 3700 with 32 CPUs and 128 GB shared memory) and a front-end system altix (an SGI Altix 330 with 16 CPUs and 32 GB shared memory; 4 CPUs + 8 GB are used as login partition (boot cpuset) – the remaining ones are also used for batch processing).

An important thing to note is that the new machine has only half the amount of memory per CPU as the "old" one. As the cpusets introduced with SuSE SLES9/SGI ProPack4.x do not have all the features known from the old SGI Origin cpusets, in particular policy kill is missing, the systems starts swapping as soon as one process exceeds the amount of memory available in its cpuset. As a results, the complete system becomes un-responsive.

Therefore, it is very important to request the correct machine or to specify the amount of memory required in addition to the number of CPUs. Also the amount of interactive work is now much more limitated as we now have a login partition (boot cpuset) which only have access to 4 CPUs and 8 GB of memory!

Check the official web page of the SGI Altic Systems at RRZE for more details and the correct syntax for specifying recource requirements.

MpCCI on SGI Altix (part 2)

MpCCI on SGI Altix (part 2)
MpCCI is a [commercial] tool (basically a library) which allows coupling different numerical codes. MpCCI is only distributed in binary form and depends on a number of other tools, in particular on MPI. For the Itanium architecture right now only a mpich-based version is provided.
Using a standard MPICH (with p4 device) on SGI Altix is a rather bad idea as the ssh-based start mechanism does not respect CPUsets, proper clean-up is not guarateed, etc.

Due to the problems related to the ssh-based start mechanims of a standard ch4p MPICH, the corresponding mpirun has been removed on Sept. 14, 2005! This guarantees better stability of our SGI Altix system, however, requires some additional steps for users of MpCCI:

  1. load as usual the module mpcci/3.0.3-ia64-glibc23 or mpcci/3.0.3-ia64-glibc23-intel9 (I hope both still work fine)
  2. compile your code as usual (and as in the past) – MPICHHOME and MPIROOTDIR are automatically set by the MpCCI module
  3. create your MpCCI specific input files
  4. interactively run ccirun -log -norun xxx.cci
  5. edit the generated ccirun.procgroup file:
    • on the first line, you have to add --mpcci-inputfile ccirun.inputfile (see the last line of the ccirun output)
    • on all lines you have to replace the number (either 0 or 1) after the hostname by a colon.
    • a complete ccirun.procgroup file now might look like
      altix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile some-path/ccirun.inputfile
      altix : some-path/ccirun.fhp.itanium_mpccimpi.1.spawn
      altix : some-path/ccirun.Binary.2.spawn
      
  6. now prepare your PBS job file; use the following line to start your program – it replaces the previous mpirun line!
    /opt/MpCCI/mpiexec-0.80/bin/mpiexec -mpich-p4-no-shmem -config=ccirun.procgroup
    
  7. and submit your job. The number of CPUs you request must be equal to (or larger than) the number of processes you start, i.e. you have to count the MpCCI controll process!

Some additional remarks:

  • it is not clear at the moment whether the runtime of such jobs can be extended once they are submitted/running. We’ll probably habe to check this on a actual run …
  • if your reads from STDIN you need an additional step to get it working again:
    • if you have something like read(*,*) x or read *,x you have to set the envirnoment variable FOR_READ to the file which contains the input
    • if you have something like read(5,*) x or read 5,x you have to set the envirnoment variable FORT5 to the file which contains the input

two additional remarks – ccirun.procgroup for mpiexec:

  • for some reason, it seems to be necessary to use only the short hostname (e.g. “altix“) instead of the fully qualified hostnamed (e.g. altix.rrze.uni-erlangen.de)
  • with some applications, the first line in the procgroup file must be “altix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile ...“, with other applications, this line must be omitted (and the option “--mpcci-inputfile ...” has to be passed to the first actual executable

additional MpCCI remarks: … as we now have two different SGI Altix systems in our batch system, you either have to explicitly request one host using -l host=altix or -l host=altix-batch or you have to dynamically generate the config file for mpiexec.

In addition, mpiexec has been upgraded to a newer version. Just use the /opt/MpCCI/mpiexec/bin/mpiexec to always get the latest version. -mpich-p4-no-shmem is nolonger necessary as it is compiled-in as default.

MpCCI auf SGI Altix

MpCCI is a library for coupling different codes (e.g. fluid mechanics and aeroaccustics) and excahnging mesh-based data between them. It is developed and sold by Fraunhofer-Institute SCAI (http://www.scai.fraunhofer.de/mpcci.html).

MpCCI is available for several platforms and relies on MPI for communication.

The good news: MpCCI SDK is available for IA64.
The bad news: it relies on mpich

On our SGI Altix system we use PBS Professional as batch queuing system and each running job gets its own CPU-set.

When now starting a MpCCI job, a procgroup file is generated and the processes are started via ssh. And that’s exactly the problem: the sshd daemon (started by root at boot-time) runs outside the CPU-set. Consequently, all processes started via ssh are also outside the allocated cpuset … 🙂

Solutions?
* shared-memory mpich does not work as the shm device of mpich does not work with MPMD, i.e. a procgroup file is not suppoerted
* using SGI MPT (SGI’s own MPI) does not work as the binary-only MpCCI library relies on some mpich symbols
* starting the code with mpiexec does not work as there are some problems with accessing stdin from within the application
* …

Module files for STAR-CD availabe on SGI Altix

Module files for STAR-CD 3.20 and 3.24 have been created on SGI Altix. As the original source etc/setstar seems to cause problems in PBS batch scripts for some users, it is generally recomended to use the new modules mechanism. The example PBS scripts in /opt/STAR-CD-3.xx have been updated to use the new modules mechanism.

Of course you can use the new mechanism also in your login sessions — it’s not limited to PBS jobs.

On Cluster32, module files are also available for STAR-CD 3.24.