MpCCI on SGI Altix (part 2)
MpCCI is a [commercial] tool (basically a library) which allows coupling different numerical codes. MpCCI is only distributed in binary form and depends on a number of other tools, in particular on MPI. For the Itanium architecture right now only a mpich-based version is provided.
Using a standard MPICH (with p4 device) on SGI Altix is a rather bad idea as the ssh-based start mechanism does not respect CPUsets, proper clean-up is not guarateed, etc.
Due to the problems related to the ssh-based start mechanims of a standard ch4p MPICH, the corresponding mpirun
has been removed on Sept. 14, 2005! This guarantees better stability of our SGI Altix system, however, requires some additional steps for users of MpCCI:
- load as usual the module
mpcci/3.0.3-ia64-glibc23
ormpcci/3.0.3-ia64-glibc23-intel9
(I hope both still work fine) - compile your code as usual (and as in the past) –
MPICHHOME
andMPIROOTDIR
are automatically set by the MpCCI module - create your MpCCI specific input files
- interactively run
ccirun -log -norun xxx.cci
- edit the generated
ccirun.procgroup
file:- on the first line, you have to add
--mpcci-inputfile ccirun.inputfile
(see the last line of the ccirun output) - on all lines you have to replace the number (either 0 or 1) after the hostname by a colon.
- a complete
ccirun.procgroup
file now might look likealtix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile some-path/ccirun.inputfile altix : some-path/ccirun.fhp.itanium_mpccimpi.1.spawn altix : some-path/ccirun.Binary.2.spawn
- on the first line, you have to add
- now prepare your PBS job file; use the following line to start your program – it replaces the previous
mpirun
line!/opt/MpCCI/mpiexec-0.80/bin/mpiexec -mpich-p4-no-shmem -config=ccirun.procgroup
- and submit your job. The number of CPUs you request must be equal to (or larger than) the number of processes you start, i.e. you have to count the MpCCI controll process!
Some additional remarks:
- it is not clear at the moment whether the runtime of such jobs can be extended once they are submitted/running. We’ll probably habe to check this on a actual run …
- if your reads from STDIN you need an additional step to get it working again:
- if you have something like
read(*,*) x
orread *,x
you have to set the envirnoment variableFOR_READ
to the file which contains the input - if you have something like
read(5,*) x
orread 5,x
you have to set the envirnoment variableFORT5
to the file which contains the input
- if you have something like
two additional remarks – ccirun.procgroup for mpiexec:
- for some reason, it seems to be necessary to use only the short hostname (e.g. “
altix
“) instead of the fully qualified hostnamed (e.g.altix.rrze.uni-erlangen.de
) - with some applications, the first line in the procgroup file must be “
altix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile ...
“, with other applications, this line must be omitted (and the option “--mpcci-inputfile ...
” has to be passed to the first actual executable
additional MpCCI remarks: … as we now have two different SGI Altix systems in our batch system, you either have to explicitly request one host using -l host=altix
or -l host=altix-batch
or you have to dynamically generate the config file for mpiexec.
In addition, mpiexec has been upgraded to a newer version. Just use the /opt/MpCCI/mpiexec/bin/mpiexec
to always get the latest version. -mpich-p4-no-shmem
is nolonger necessary as it is compiled-in as default.