A new version of MpCCI (3.0.5) has been released. See www.mpcci.org for more details. Many things changed — in particular, the MpCCI library itself should be much more encapsulated so that it should be completely independent of the MPI version used within the application program.
However, owing to licensing problems, the new version could not yet be tested on the HPC systems of RRZE … Well, the solution was rather trivial: the license file did not contain the FQN of the server and therefore checking out or even testing the license from a remote host did not work if the FQN was used there, i.e. the solution was just to replace the shortend hostname by the full hostname (including domain) or also possible by the IP address. After restarting the license server, the licensensing problems disappeard
In the meantime (early August 2006), MpCCI 3.0.5 is also available on the LRZ machines …
However, there seem to be still some problems on the application side — probably not a big surprise as the internal structure of MpCCI 3.0.5 and the API have completely changed …
… especially on SGI Altix systems (or other machines where CPUsets are used), a start mechanism based on ssh cannot be used as CPUsets would be escaped and “batch” processes start running on interactive/boot CPUsets …
MpCCI auf SGI Altix (3)
Tests to use the client-server mode with SGI MPT … — a never ending tragedy:
- link your application with:
ccilink -client -nompilink ..... -lmpi
- produce the procgroup file with
ccirun -server -norun ex2.cci
- run your simulation:
/opt/MpCCI/mpiexec/bin/mpiexec -server &
/opt/MpCCI/mpiexec/bin/mpiexec -config=ccirun_server.procgroup >& s & x1 & x2 < /dev/null
The number of CPUs requested must be at least as large as the number of server processes, i.e. those started with
If you use mpich instead of MPT all processes have to be started with
mpiexec. As a consequence, the number of requested CPUs must be equal to the total number of processes. The PBS-TM interface used by mpiexec does not allow to overbook CPUs.
… after many hours of testing: IT SEEMS THAT MpCCI WITH SGI-MPT DOES *NOT* WORK RELIABLY AT ALL … mpi_comm_rank==0 on all processes 🙂 despite using the correct
mpif.h files and
mpiruns for the server and client applications.
My current conclusions:
- MpCCI does not support SGI MPT natively
- using mpich on SGI Altix for all communications is NO option as benchmarks showed that CFD applications are slower by a factor of 2 or more when using mpich instead of MPT
- using MpCCI in client-server mode also seems not to work (see above)
That means, MpCCI is not at all usable on SGI Altix. Sorry for those who rely on it.
Recently, the SGI Altix at RRZE has been extended. We now have a batch-only system altix-batch (an SGI Altix 3700 with 32 CPUs and 128 GB shared memory) and a front-end system altix (an SGI Altix 330 with 16 CPUs and 32 GB shared memory; 4 CPUs + 8 GB are used as login partition (boot cpuset) – the remaining ones are also used for batch processing).
An important thing to note is that the new machine has only half the amount of memory per CPU as the "old" one. As the cpusets introduced with SuSE SLES9/SGI ProPack4.x do not have all the features known from the old SGI Origin cpusets, in particular policy kill is missing, the systems starts swapping as soon as one process exceeds the amount of memory available in its cpuset. As a results, the complete system becomes un-responsive.
Therefore, it is very important to request the correct machine or to specify the amount of memory required in addition to the number of CPUs. Also the amount of interactive work is now much more limitated as we now have a login partition (boot cpuset) which only have access to 4 CPUs and 8 GB of memory!
Check the official web page of the SGI Altic Systems at RRZE for more details and the correct syntax for specifying recource requirements.
MpCCI on SGI Altix (part 2)
MpCCI is a [commercial] tool (basically a library) which allows coupling different numerical codes. MpCCI is only distributed in binary form and depends on a number of other tools, in particular on MPI. For the Itanium architecture right now only a mpich-based version is provided.
Using a standard MPICH (with p4 device) on SGI Altix is a rather bad idea as the ssh-based start mechanism does not respect CPUsets, proper clean-up is not guarateed, etc.
Due to the problems related to the ssh-based start mechanims of a standard ch4p MPICH, the corresponding
mpirun has been removed on Sept. 14, 2005! This guarantees better stability of our SGI Altix system, however, requires some additional steps for users of MpCCI:
- load as usual the module
mpcci/3.0.3-ia64-glibc23-intel9 (I hope both still work fine)
- compile your code as usual (and as in the past) –
MPIROOTDIR are automatically set by the MpCCI module
- create your MpCCI specific input files
- interactively run
ccirun -log -norun xxx.cci
- edit the generated
- now prepare your PBS job file; use the following line to start your program – it replaces the previous
/opt/MpCCI/mpiexec-0.80/bin/mpiexec -mpich-p4-no-shmem -config=ccirun.procgroup
- and submit your job. The number of CPUs you request must be equal to (or larger than) the number of processes you start, i.e. you have to count the MpCCI controll process!
Some additional remarks:
- it is not clear at the moment whether the runtime of such jobs can be extended once they are submitted/running. We’ll probably habe to check this on a actual run …
- if your reads from STDIN you need an additional step to get it working again:
- if you have something like
read(*,*) x or
read *,x you have to set the envirnoment variable
FOR_READ to the file which contains the input
- if you have something like
read(5,*) x or
read 5,x you have to set the envirnoment variable
FORT5 to the file which contains the input
two additional remarks – ccirun.procgroup for mpiexec:
- for some reason, it seems to be necessary to use only the short hostname (e.g. “
altix“) instead of the fully qualified hostnamed (e.g.
- with some applications, the first line in the procgroup file must be “
altix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile ...“, with other applications, this line must be omitted (and the option “
--mpcci-inputfile ...” has to be passed to the first actual executable
additional MpCCI remarks: … as we now have two different SGI Altix systems in our batch system, you either have to explicitly request one host using
-l host=altix or
-l host=altix-batch or you have to dynamically generate the config file for mpiexec.
In addition, mpiexec has been upgraded to a newer version. Just use the
/opt/MpCCI/mpiexec/bin/mpiexec to always get the latest version.
-mpich-p4-no-shmem is nolonger necessary as it is compiled-in as default.
MpCCI is a library for coupling different codes (e.g. fluid mechanics and aeroaccustics) and excahnging mesh-based data between them. It is developed and sold by Fraunhofer-Institute SCAI (http://www.scai.fraunhofer.de/mpcci.html).
MpCCI is available for several platforms and relies on MPI for communication.
The good news: MpCCI SDK is available for IA64.
The bad news: it relies on mpich
On our SGI Altix system we use PBS Professional as batch queuing system and each running job gets its own CPU-set.
When now starting a MpCCI job, a
procgroup file is generated and the processes are started via
ssh. And that’s exactly the problem: the
sshd daemon (started by root at boot-time) runs outside the CPU-set. Consequently, all processes started via ssh are also outside the allocated cpuset … 🙂
* shared-memory mpich does not work as the shm device of mpich does not work with MPMD, i.e. a procgroup file is not suppoerted
* using SGI MPT (SGI’s own MPI) does not work as the binary-only MpCCI library relies on some mpich symbols
* starting the code with mpiexec does not work as there are some problems with accessing
stdin from within the application
Module files for STAR-CD 3.20 and 3.24 have been created on SGI Altix. As the original
source etc/setstar seems to cause problems in PBS batch scripts for some users, it is generally recomended to use the new modules mechanism. The example PBS scripts in
/opt/STAR-CD-3.xx have been updated to use the new modules mechanism.
Of course you can use the new mechanism also in your login sessions — it’s not limited to PBS jobs.
On Cluster32, module files are also available for STAR-CD 3.24.