Thomas Zeiser

Some comments by Thomas Zeiser about HPC@RRZE and other things


Pinning of MPI processes

RRZE’s mpirun-wrapper which can be used with Intel-MPI, MPICH, MVAPICH, MVAPICH2 has the option -pin which enables explicit pinning of processes to certain cores. See mpirun -h on one of RRZE’s cluster systems for details.

Open-MPI cannot (yet) be used with RRZE’s mpirun-wrapper. However, Open-MPI’s mpirun (or mpiexec) already has a lot of very nice features, including support for explicit pinning. using the --rankfile xyz command line option. This option works even if the job is running under control of PBS/torque. The only cumbersome task is to create the rankfile, however, you do not need to know how the CPUs are numbers in a multi-core, multi-socket system as Open-MPI used logical descriptions, i.e. socket number and core number within the socket. The syntax of the rankfile is as follows (check the Open-MPI manpage of mpirun for details):

          rank 0=w0101 slot=0:0
          rank 1=w0101 slot=0:1
          rank 2=w0101 slot=1:0
          rank 3=w0101 slot=1:1

which bind rank0 to the first CPU in socket 0, rank1 to the second CPU of socket0, etc. Of course the hostname (w0101 in the example) must match the list of nodes you got from the queuing system – and you need one line per MPI rank, i.e. 256 lines if running on 64 nodes with 4 cores each). As also ranges of CPUs can be specifid (see manpage for details; section “Specifying Ranks”), this mechanism should also work quite well for hybrid codes (i.e. MPI+OpenMP) although the OpenMPI threads not bound themselves to explicit cores but only altogether to groups of cores … Additional effort (e.g. based on the pthread-overload.c library used by RRZE’s pin_omp) would be required to explicitly ping hybrid OpenMPI threads, too.

Further information on the usage of pinning on the RRZE clusters is described in (using RRZE’s recommended/supported mpirun-wrapper and Intel-MPI)