MpCCI on SGI Altix (part 2)
MpCCI is a [commercial] tool (basically a library) which allows coupling different numerical codes. MpCCI is only distributed in binary form and depends on a number of other tools, in particular on MPI. For the Itanium architecture right now only a mpich-based version is provided.
Using a standard MPICH (with p4 device) on SGI Altix is a rather bad idea as the ssh-based start mechanism does not respect CPUsets, proper clean-up is not guarateed, etc.
Due to the problems related to the ssh-based start mechanims of a standard ch4p MPICH, the corresponding mpirun has been removed on Sept. 14, 2005! This guarantees better stability of our SGI Altix system, however, requires some additional steps for users of MpCCI:
- load as usual the module
mpcci/3.0.3-ia64-glibc23ormpcci/3.0.3-ia64-glibc23-intel9(I hope both still work fine) - compile your code as usual (and as in the past) –
MPICHHOMEandMPIROOTDIRare automatically set by the MpCCI module - create your MpCCI specific input files
- interactively run
ccirun -log -norun xxx.cci - edit the generated
ccirun.procgroupfile:- on the first line, you have to add
--mpcci-inputfile ccirun.inputfile(see the last line of the ccirun output) - on all lines you have to replace the number (either 0 or 1) after the hostname by a colon.
- a complete
ccirun.procgroupfile now might look likealtix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile some-path/ccirun.inputfile altix : some-path/ccirun.fhp.itanium_mpccimpi.1.spawn altix : some-path/ccirun.Binary.2.spawn
- on the first line, you have to add
- now prepare your PBS job file; use the following line to start your program – it replaces the previous
mpirunline!/opt/MpCCI/mpiexec-0.80/bin/mpiexec -mpich-p4-no-shmem -config=ccirun.procgroup
- and submit your job. The number of CPUs you request must be equal to (or larger than) the number of processes you start, i.e. you have to count the MpCCI controll process!
Some additional remarks:
- it is not clear at the moment whether the runtime of such jobs can be extended once they are submitted/running. We’ll probably habe to check this on a actual run …
- if your reads from STDIN you need an additional step to get it working again:
- if you have something like
read(*,*) xorread *,xyou have to set the envirnoment variableFOR_READto the file which contains the input - if you have something like
read(5,*) xorread 5,xyou have to set the envirnoment variableFORT5to the file which contains the input
- if you have something like
two additional remarks – ccirun.procgroup for mpiexec:
- for some reason, it seems to be necessary to use only the short hostname (e.g. “
altix“) instead of the fully qualified hostnamed (e.g.altix.rrze.uni-erlangen.de) - with some applications, the first line in the procgroup file must be “
altix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile ...“, with other applications, this line must be omitted (and the option “--mpcci-inputfile ...” has to be passed to the first actual executable
additional MpCCI remarks: … as we now have two different SGI Altix systems in our batch system, you either have to explicitly request one host using -l host=altix or -l host=altix-batch or you have to dynamically generate the config file for mpiexec.
In addition, mpiexec has been upgraded to a newer version. Just use the /opt/MpCCI/mpiexec/bin/mpiexec to always get the latest version. -mpich-p4-no-shmem is nolonger necessary as it is compiled-in as default.
Typo
mpirun and not mpitun