Attention: OFED disallows
system(const char*) or
fork/exec after initializing the Infiniband libraries. Some documentation mentions about this:
… the Mellanox InfiniBand driver has ssues with buffers sharing pages when fork() is used. Pinned (locked in memory) pages are normally marked copy-on-write during a fork. If a page is pinned before a fork and subsequently written to while RDMA operations are being performed on the same page, silent data corruption can occur as RDMA operations continue to stream data to a page that has moved. To avoid this, the Mellanox driver does not use copy-on-write behavior during a fork for pinned pages. Instead, access to these pages by the child process will result in a segmentation violation.
Fork support from kernel 2.6.12 and above is available provided that applications do not use threads. The fork() is supported as long as parent process does not run before child exits or calls exec(). The former can be achieved by calling wait(childpid) the later can be achieved by application specific means. Posix system() call is supported.
Woody is running a SuSE SLES9 kernel, i.e. 2.6.5. Thus, no support for fork and similar things!
Some users already hit this problem! Even a Fortran user who had
call system('some command') in his code! In the latter case, the application just hang in some (matching) MPI_send/MPI_recv calls.
Kommentare deaktiviert für fork and OFED Infiniband stack
It took quite long until a solution for prolonging the joint license pool with an increased number of licenses for parallel runs could be found. But everything seems to be solved now for the next three years.
Further chairs can join at any time – of course, the license may only be used for education and scientific research (and not for industrial research or projects). If additional groups join, this will not increase the total costs (unsless additional license features will be required) but reduce the amount the individual groups have to pay annually …
Also check some notes if you use STAR-CD on RRZE’s new parallel computer Woody.
Kommentare deaktiviert für Common license pool for STAR-CD probably continues for next three years
Dear HPC users,
we are glad to invite you on July 2nd to the
KONWIHR Results Workshop
3rd Erlangen International High-End-Computing Symposium
which take place at the University of Erlangen.
The event is jointly organized by Lehrstuhl für Systemsimulation (LSS), Regionales Rechenzentrum Erlangen (RRZE), Bavarian Graduate School in Computational Engineering (BGCE) and Competence Network for Technical, Scientific High Performance Computing in Bavaria (KONWIHR).
The program is a follows:
In the morning, nine Bavarian groups will report about the wide spectrum of HPC activities carried out in the framework of the Competence Network for Technical, Scientific High Performance Computing in Bavaria (KONWIHR).
In the afternoon, five internationally accepted experts will review High-End-Computing from an international perspective and give an outlook to future developments.
Participation is free of charge, but a registration is expected to allow us better planning. For all further/updated information and the registration form visit http://www10.informatik.uni-erlangen.de/Misc/EIHECS3/
We are looking forward to see you on July 2nd at RRZE!
HPC@RRZE also on befalf of the other organizers
PS: additional information, in particular for speakers, can be found in an other story of my blog.
Kommentare deaktiviert für Invitation: KONWIHR-Workshop and 3rd High-End-Computing Symposium on July 2nd at FAU