The Sun Studio has a long tradition and provides a lot of tools beyond just the compilers. It’s available free of charge not only for Solaris (Sparc and x86/x64) but also for Linux (x64). To give it a try, I just installed it on one of our systems. Unfortunately, Sun does not (yet) provide the Sun ClusterTools for Linux. Thus, I had to compile a MPI library by my own.
As we also do not have any experience with open-mpi either, I gave that at the same time a try. Unfortunately, open-mpi (1.2.3) requires some patching to get compiled with the Sun Studio 12 compilers (as documented in the open-mpi FAQ). But besides that there were no problems to get the PBS/torque TM-interface and Open Fabrics Infiniband included.
Next, how to run MPI programs from within batch jobs? We are used to Pete Wyckoff’s mpiexec which we extended with a quite powerful wrapper to allow pinning of processes (using PLPA), specification of round robin or block assignment of processes, partially filled nodes, etc.. Open-mpi comes with its own PBM TM-interface, thus, the next steps will be to figure out how all functionality can be provided with open-mpi’s
mpirun. So far, I did not find an option to use nodes only partially. But at least there is
--byslot (default) and I quickly got some MPI PingPong numbers between nodes – the numbers look reasonable. …
A promising start for more explorations in the future.
If you are running with many MPI processes on large SMP systems, e.g. SUN’s Niagara2 systems, you might need to increase the “open files” limit significantly, e.g. by issuing
ulimit -n 32768 otherwise the MPI startup may fail with messages like
unable to create shared memory mapping.