Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things

Content

MpCCI on SGI Altix (part 2)

MpCCI on SGI Altix (part 2)
MpCCI is a [commercial] tool (basically a library) which allows coupling different numerical codes. MpCCI is only distributed in binary form and depends on a number of other tools, in particular on MPI. For the Itanium architecture right now only a mpich-based version is provided.
Using a standard MPICH (with p4 device) on SGI Altix is a rather bad idea as the ssh-based start mechanism does not respect CPUsets, proper clean-up is not guarateed, etc.

Due to the problems related to the ssh-based start mechanims of a standard ch4p MPICH, the corresponding mpirun has been removed on Sept. 14, 2005! This guarantees better stability of our SGI Altix system, however, requires some additional steps for users of MpCCI:

  1. load as usual the module mpcci/3.0.3-ia64-glibc23 or mpcci/3.0.3-ia64-glibc23-intel9 (I hope both still work fine)
  2. compile your code as usual (and as in the past) – MPICHHOME and MPIROOTDIR are automatically set by the MpCCI module
  3. create your MpCCI specific input files
  4. interactively run ccirun -log -norun xxx.cci
  5. edit the generated ccirun.procgroup file:
    • on the first line, you have to add --mpcci-inputfile ccirun.inputfile (see the last line of the ccirun output)
    • on all lines you have to replace the number (either 0 or 1) after the hostname by a colon.
    • a complete ccirun.procgroup file now might look like
      altix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile some-path/ccirun.inputfile
      altix : some-path/ccirun.fhp.itanium_mpccimpi.1.spawn
      altix : some-path/ccirun.Binary.2.spawn
      
  6. now prepare your PBS job file; use the following line to start your program – it replaces the previous mpirun line!
    /opt/MpCCI/mpiexec-0.80/bin/mpiexec -mpich-p4-no-shmem -config=ccirun.procgroup
    
  7. and submit your job. The number of CPUs you request must be equal to (or larger than) the number of processes you start, i.e. you have to count the MpCCI controll process!

Some additional remarks:

  • it is not clear at the moment whether the runtime of such jobs can be extended once they are submitted/running. We’ll probably habe to check this on a actual run …
  • if your reads from STDIN you need an additional step to get it working again:
    • if you have something like read(*,*) x or read *,x you have to set the envirnoment variable FOR_READ to the file which contains the input
    • if you have something like read(5,*) x or read 5,x you have to set the envirnoment variable FORT5 to the file which contains the input

two additional remarks – ccirun.procgroup for mpiexec:

  • for some reason, it seems to be necessary to use only the short hostname (e.g. “altix“) instead of the fully qualified hostnamed (e.g. altix.rrze.uni-erlangen.de)
  • with some applications, the first line in the procgroup file must be “altix : some-path/ccirun.cci-control.0.spawn --mpcci-inputfile ...“, with other applications, this line must be omitted (and the option “--mpcci-inputfile ...” has to be passed to the first actual executable

additional MpCCI remarks: … as we now have two different SGI Altix systems in our batch system, you either have to explicitly request one host using -l host=altix or -l host=altix-batch or you have to dynamically generate the config file for mpiexec.

In addition, mpiexec has been upgraded to a newer version. Just use the /opt/MpCCI/mpiexec/bin/mpiexec to always get the latest version. -mpich-p4-no-shmem is nolonger necessary as it is compiled-in as default.