<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Thomas Zeiser</title>
	<atom:link href="http://blogs.fau.de/zeiser/feed/" rel="self" type="application/rss+xml" />
	<link>http://blogs.fau.de/zeiser</link>
	<description>Some comments by Thomas Zeiser about HPC@RRZE and other things</description>
	<lastBuildDate>Wed, 24 Oct 2012 14:54:01 +0000</lastBuildDate>
	<language>de-DE</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
		<item>
		<title>Abschlussbericht des BMBF-Projekts SKALB</title>
		<link>http://blogs.fau.de/zeiser/2012/08/29/abschlussbericht-des-bmbf-projekts-skalb/</link>
		<comments>http://blogs.fau.de/zeiser/2012/08/29/abschlussbericht-des-bmbf-projekts-skalb/#comments</comments>
		<pubDate>Wed, 29 Aug 2012 10:23:17 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[SKALB]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5676</guid>
		<description><![CDATA[Das Förderung für das Projekts <strong>SKALB</strong> (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) aus dem ersten <a href="http://www.bmbf.de/foerderungen/11830.php"> Call für Forschungsvorhaben auf dem Gebiet "HPC-Software für skalierbare Parallelrechner" im Rahmen des Förderprogramms IKT 2020 – Forschung für Innovationen</a> des BMBF von 1997 ist am 31.12.2011 ausgelaufen. Inzwischen ist der offizielle Abschlussbericht fertig und kann hier als <a href='http://blogs.fau.de/zeiser/files/2012/08/bmbf-fkz-01IH08003.pdf'>PDF-Datei</a> heruntergeladen werden. Demnächst sollte der Abschlussbericht auch über <a href="http://www.tib.uni-hannover.de/">Technische Informationsbibliothek (TIB) in Hannover</a> verfügbar sein.<p></p><p>Besonders eindrucksvoll ist, dass durch die SKALB-Förderung bis jetzt über 60 Publikationen und annähernd 100 Vorträge auf Fachveranstaltungen entstanden sind.</p>]]></description>
				<content:encoded><![CDATA[<p>Das Förderung für das Projekts <strong>SKALB</strong> (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) aus dem ersten <a href="http://www.bmbf.de/foerderungen/11830.php"> Call für Forschungsvorhaben auf dem Gebiet &#8220;HPC-Software für skalierbare Parallelrechner&#8221; im Rahmen des Förderprogramms IKT 2020 – Forschung für Innovationen</a> des BMBF von 1997 ist am 31.12.2011 ausgelaufen. Inzwischen ist der offizielle Abschlussbericht fertig und kann hier als <a href='http://blogs.fau.de/zeiser/files/2012/08/bmbf-fkz-01IH08003.pdf'>PDF-Datei</a> heruntergeladen werden. Demnächst sollte der Abschlussbericht auch über <a href="http://www.tib.uni-hannover.de/">Technische Informationsbibliothek (TIB) in Hannover</a> verfügbar sein.</p>
<p>Besonders eindrucksvoll ist, dass durch die SKALB-Förderung bis jetzt über 60 Publikationen und annähernd 100 Vorträge auf Fachveranstaltungen entstanden sind.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2012/08/29/abschlussbericht-des-bmbf-projekts-skalb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Why cfx5solve from Ansys-13.0 fails on SuSE SLES11SP2 &#8230;</title>
		<link>http://blogs.fau.de/zeiser/2012/08/23/why-cfx5solve-from-ansys-13-0-fails-on-suse-sles11sp2/</link>
		<comments>http://blogs.fau.de/zeiser/2012/08/23/why-cfx5solve-from-ansys-13-0-fails-on-suse-sles11sp2/#comments</comments>
		<pubDate>Thu, 23 Aug 2012 19:46:20 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[HPC-Cluster@RRZE]]></category>
		<category><![CDATA[Software]]></category>
		<category><![CDATA[Ansys]]></category>
		<category><![CDATA[cc2flow]]></category>
		<category><![CDATA[CFX]]></category>
		<category><![CDATA[SLES11]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5663</guid>
		<description><![CDATA[Recently, the operating system of one of RRZE's HPC clusters was upgraded from SuSE SLES10 SP4 to SuSE SLES11 SP2 ... one of the few things which broke due to the OS upgrade is Ansys/CFX-13.0. cfx5solve now aborts with<p>ccl2flow: * command language error *</p><p>Message: getChildList: unable to find the requested path</p><p>Context: returned by cclApi call</p><p>As one can expect, Ansys does not support running Ansys-13.0 on SuSE SLES11 or SLES11 SP2. There are also lots of reports on this error for different <em>unsupported</em> OS versions in the CFX forum at <em>cfd-online</em> but no explanations or workarounds yet.</p><p></p><p><strong>So, where does the problem come from? A long story starts ...</strong></p><p></p><p><strong>First guess:</strong> SuSE SLES11 SP2 runs a 3.0 kernel. Thus, there might be some script which does not correctly parse the uname or so. However, the problem persists if cfx5solve is run using uname26 (or the equivalent long setarch variant). On the other hand, the problem does not occur if e.g. a CentOS-5 chroot is started on the SLES11 SP2 kernel, i.e. still the same kernel but old user space. This clearly indicates that it is no kernel issue but some library or tool problem.</p><p></p><p><strong>Next guess:</strong> Perl comes bundled with Ansys/CFX but it might be some other command line tool from the Linux distribution which is used by cfx5solve, e.g. sed and friends or some changed bash behavior. Using strace on cfx5solve reveals several calls of such tools. But actually, none of them is problematic.</p><p></p><p><strong>Thus, it must be a library issue:</strong> Ansys/CFX comes with most of the libraries it needs bundled but there is always the <em>glibc</em>, i.e. /lib64/ld-linux-x86-64.so.2, /lib64/libc.so.6, etc. SuSE SLES10 used <em>glibc-2.4</em>, RHEL5 uses <em>glibc-2.5</em> but SLES11 SP2 uses <em>glibc-2.11.3</em> ...</p><p></p><p>The <em>glibc</em> cannot be overwritten using LD_LIBRARY_PATH as any another library. But there are ways to do it anyway ...</p><p></p><p>The error message suggests that ccl2flow.exe is causing the problems. So, let's run that with an old <em>glibc</em> version. As cfx5solve allows specifying a custom <em>ccl2flow</em> binary we can use a simple shell script to call the actual ccl2flow.exe using the loader and <em>glibc</em> libraries from the CentOS5 <em>glibc-2.5</em>. Nothing changes; still the very same <em>getChildList</em> error message in the <em>out</em> file. Does that mean that ccl2flow.exe is not the bad guy?</p><p></p><p><strong>Interlude:</strong> Let's see how ccl2flow.exe is called. The shell wrapper for <em>ccl2flow</em> was already there, thus, let's add some <em>echo</em> statements to the command line arguments and a <em>sleep</em> statement to inspect the working directory. Et vola. On a good system, a quite long <em>ccl</em> file has just been created before <em>ccl2flow</em> is called; however, on a bad system running the new OS the <em>ccl</em> file is almost empty. Thus, we should not blame ccl2flow.exe but what happens before. Well, before there is just the Ansys supplied perl running.</p><p></p><p><strong>Let's have a closer look at the perl script:</strong> Understanding what the <em>cfx5solve</em> Perl script does seems to be impossible. Even if the Perl script is traced on a good and bad system there are no real insights. At some point, the bad system does not return an object while the other does. Thus, let's run perl using the old <em>glibc</em> version. That's a little bit more tricky as cfx5solve is not a binary but a shell script which calls another shell script before finally calling an Ansys-supplied perl binary. But one can also manage these additional difficulties. Et vola, the error message disappeared. What's going on? Perl is running fine but producing different results depending on the <em>glibc</em> version.</p><p></p><p><strong>Interlude Ansys/CFX-14.0:</strong> This version if officially only supported on SuSE SLES11 but not SLES11 SP2 if I got it correctly. But it runs fine on SLES11 SP2, too. What Perl version do they use? Exactly the same version, even the very same binary (i.e. both binaries have the same checksum!). Thus, it is not the Perl itself but some CFX-specific library it dynamically loads. </p><p></p><p><strong>End of the story? Not yet but Almost.</strong> Spending already so much time on the problem I finally wanted to know which glibc versions are good or evil. I already knew Redhat's glibc-2.5 is good and SuSE's glibc-2.11.3 is evil. Thus, let's try the versions in between using the official sources from <em>ftp.gnu.org/gnu/glibc</em>. Versions &lt;2.10 or so require a fix for the configure script to recognice a modern as or ld as good version. A few versions do not compile properly at all on my system. But there is <strong>no</strong> bad version, even with 2.11.3 there is no CFX error. Only starting from glibc-2.12.1 on there is the well-known ccl2flow error. Not really surprising. SuSE and other Linux distributors have long lists of patches they apply, including back-ports from newer releases. There are almost 100 SuSE patches included in their version of glibc-2.11.3-17.39.1; no chance to see what they are doing.</p><p></p><p><strong>My next guess</strong> is that the problem must be a commit between 2.11.3 and 2.12.1 of the official glibc versions. GNU proves a Git repository and git bisect is your friend. This leads to commit <em>f89d2f30</em> from Dec. 2009: <em>Enable multiarch whenever possible.</em> This commit did not change any actual code but only the default configuration parameters. That means, the code causing the fault must be in the sources already much before. It only debuted once multi-arch was switched on in 2.12.1 of the vanilla version or earlier in the SuSE version (the <em>spec</em> file contains an --enable-multi-arch line; proved).</p><p></p><p>Going back in history, it finally turns out that <em>glibc</em> commit <em>ab6a873f</em> from Jun 2009 (<em>SSSE3 strcpy/stpcpy for x86-64</em>) is responsible for the problems leading to the failing <em>ccl2flow</em>.</p><p></p><p>Unfortunately, it is not possible to see if the most recent <em>glibc</em> versions still cause problems as cfx5solve already aborts earlier with some error message (<em>Can't call method "numstring" on an undefined value</em>).</p><p></p><p>It is also not clear whether it is a glibc error, a problem in one of the CFX library or if it just because of the tools used when Ansys-13.0 was compiled.</p><p></p><p><strong>End of the story:</strong> If you a willing to take the risk of getting wrong results, you may make v130/CFX/tools/perl-5.8.0-1/bin/Linux-x86_64/perl use an older <em>glibc</em> version (or one compiled without multi-arch support) and thus avoid the <em>ccl2flow</em> error. But who knows what else fails visibly or behind the scene. There is a unknown risk of wrong results even if cfx5solve now runs in principle on SuSE SLES11 SP2.</p><p></p><p>I fully understand that users do not want to switch versions within a running project. Thus, it is really a pity that ISVs force users (and sys admins) to run very old OS versions. SuSE SLES 10 was released in 2006 and will reach end of general support in July 2013; SLES11 was released in March 2009 while Ansys13 was released only in autumn 2010. And we still shall stick to SLES10? It's time to increase the pressure on ISVs or to start developing in-house codes again.</p>]]></description>
				<content:encoded><![CDATA[<p>Recently, the operating system of one of RRZE&#8217;s HPC clusters was upgraded from SuSE SLES10 SP4 to SuSE SLES11 SP2 &#8230; one of the few things which broke due to the OS upgrade is Ansys/CFX-13.0. <code>cfx5solve</code> now aborts with</p>
<blockquote><p>ccl2flow: * command language error *<br />
Message: getChildList: unable to find the requested path<br />
Context: returned by cclApi call</p></blockquote>
<p>As one can expect, Ansys does not support running Ansys-13.0 on SuSE SLES11 or SLES11 SP2. There are also lots of reports on this error for different <em>unsupported</em> OS versions in the CFX forum at <em>cfd-online</em> but no explanations or workarounds yet.</p>
<p><strong>So, where does the problem come from? A long story starts &#8230;</strong></p>
<p><strong>First guess:</strong> SuSE SLES11 SP2 runs a 3.0 kernel. Thus, there might be some script which does not correctly parse the <code>uname</code> or so. However, the problem persists if <code>cfx5solve</code> is run using <code>uname26</code> (or the equivalent long <code>setarch</code> variant). On the other hand, the problem does not occur if e.g. a CentOS-5 chroot is started on the SLES11 SP2 kernel, i.e. still the same kernel but old user space. This clearly indicates that it is no kernel issue but some library or tool problem.</p>
<p><strong>Next guess:</strong> Perl comes bundled with Ansys/CFX but it might be some other command line tool from the Linux distribution which is used by <code>cfx5solve</code>, e.g. <code>sed</code> and friends or some changed <code>bash</code> behavior. Using <code>strace</code> on <code>cfx5solve</code> reveals several calls of such tools. But actually, none of them is problematic.</p>
<p><strong>Thus, it must be a library issue:</strong> Ansys/CFX comes with most of the libraries it needs bundled but there is always the <em>glibc</em>, i.e. <code>/lib64/ld-linux-x86-64.so.2</code>, <code>/lib64/libc.so.6</code>, etc. SuSE SLES10 used <em>glibc-2.4</em>, RHEL5 uses <em>glibc-2.5</em> but SLES11 SP2 uses <em>glibc-2.11.3</em> &#8230;</p>
<p>The <em>glibc</em> cannot be overwritten using <code>LD_LIBRARY_PATH</code> as any another library. But there are ways to do it anyway &#8230;</p>
<p>The error message suggests that <code>ccl2flow.exe</code> is causing the problems. So, let&#8217;s run that with an old <em>glibc</em> version. As <code>cfx5solve</code> allows specifying a custom <em>ccl2flow</em> binary we can use a simple shell script to call the actual <code>ccl2flow.exe</code> using the loader and <em>glibc</em> libraries from the CentOS5 <em>glibc-2.5</em>. Nothing changes; still the very same <em>getChildList</em> error message in the <em>out</em> file. Does that mean that <code>ccl2flow.exe</code> is not the bad guy?</p>
<p><strong>Interlude:</strong> Let&#8217;s see how <code>ccl2flow.exe</code> is called. The shell wrapper for <em>ccl2flow</em> was already there, thus, let&#8217;s add some <em>echo</em> statements to the command line arguments and a <em>sleep</em> statement to inspect the working directory. Et vola. On a good system, a quite long <em>ccl</em> file has just been created before <em>ccl2flow</em> is called; however, on a bad system running the new OS the <em>ccl</em> file is almost empty. Thus, we should not blame <code>ccl2flow.exe</code> but what happens before. Well, before there is just the Ansys supplied <code>perl</code> running.</p>
<p><strong>Let&#8217;s have a closer look at the perl script:</strong> Understanding what the <em>cfx5solve</em> Perl script does seems to be impossible. Even if the Perl script is traced on a good and bad system there are no real insights. At some point, the bad system does not return an object while the other does. Thus, let&#8217;s run <code>perl</code> using the old <em>glibc</em> version. That&#8217;s a little bit more tricky as <code>cfx5solve</code> is not a binary but a shell script which calls another shell script before finally calling an Ansys-supplied <code>perl</code> binary. But one can also manage these additional difficulties. Et vola, the error message disappeared. What&#8217;s going on? Perl is running fine but producing different results depending on the <em>glibc</em> version.</p>
<p><strong>Interlude Ansys/CFX-14.0:</strong> This version if officially only supported on SuSE SLES11 but not SLES11 SP2 if I got it correctly. But it runs fine on SLES11 SP2, too. What Perl version do they use? Exactly the same version, even the very same binary (i.e. both binaries have the same checksum!). Thus, it is not the Perl itself but some CFX-specific library it dynamically loads. </p>
<p><strong>End of the story? Not yet but Almost.</strong> Spending already so much time on the problem I finally wanted to know which glibc versions are good or evil. I already knew Redhat&#8217;s glibc-2.5 is good and SuSE&#8217;s glibc-2.11.3 is evil. Thus, let&#8217;s try the versions in between using the official sources from <em>ftp.gnu.org/gnu/glibc</em>. Versions &lt;2.10 or so require a fix for the configure script to recognice a modern <code>as</code> or <code>ld</code> as good version. A few versions do not compile properly at all on my system. But there is <strong>no</strong> bad version, even with 2.11.3 there is no CFX error. Only starting from glibc-2.12.1 on there is the well-known ccl2flow error. Not really surprising. SuSE and other Linux distributors have long lists of patches they apply, including back-ports from newer releases. There are almost 100 SuSE patches included in their version of glibc-2.11.3-17.39.1; no chance to see what they are doing.</p>
<p><strong>My next guess</strong> is that the problem must be a commit between 2.11.3 and 2.12.1 of the official glibc versions. GNU proves a Git repository and <code>git bisect</code> is your friend. This leads to commit <em>f89d2f30</em> from Dec. 2009: <em>Enable multiarch whenever possible.</em> This commit did not change any actual code but only the default configuration parameters. That means, the code causing the fault must be in the sources already much before. It only debuted once multi-arch was switched on in 2.12.1 of the vanilla version or earlier in the SuSE version (the <em>spec</em> file contains an <code>--enable-multi-arch</code> line; proved).</p>
<p>Going back in history, it finally turns out that <em>glibc</em> commit <em>ab6a873f</em> from Jun 2009 (<em>SSSE3 strcpy/stpcpy for x86-64</em>) is responsible for the problems leading to the failing <em>ccl2flow</em>.</p>
<p>Unfortunately, it is not possible to see if the most recent <em>glibc</em> versions still cause problems as cfx5solve already aborts earlier with some error message (<em>Can&#8217;t call method &#8220;numstring&#8221; on an undefined value</em>).</p>
<p>It is also not clear whether it is a glibc error, a problem in one of the CFX library or if it just because of the tools used when Ansys-13.0 was compiled.</p>
<p><strong>End of the story:</strong> If you a willing to take the risk of getting wrong results, you may make <code>v130/CFX/tools/perl-5.8.0-1/bin/Linux-x86_64/perl</code> use an older <em>glibc</em> version (or one compiled without multi-arch support) and thus avoid the <em>ccl2flow</em> error. But who knows what else fails visibly or behind the scene. There is a unknown risk of wrong results even if <code>cfx5solve</code> now runs in principle on SuSE SLES11 SP2.</p>
<p>I fully understand that users do not want to switch versions within a running project. Thus, it is really a pity that ISVs force users (and sys admins) to run very old OS versions. SuSE SLES 10 was released in 2006 and will reach end of general support in July 2013; SLES11 was released in March 2009 while Ansys13 was released only in autumn 2010. And we still shall stick to SLES10? It&#8217;s time to increase the pressure on ISVs or to start developing in-house codes again.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2012/08/23/why-cfx5solve-from-ansys-13-0-fails-on-suse-sles11sp2/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Additional throughput nodes added to Woody cluster</title>
		<link>http://blogs.fau.de/zeiser/2011/12/17/additional-throughput-nodes-added-to-woody-cluster/</link>
		<comments>http://blogs.fau.de/zeiser/2011/12/17/additional-throughput-nodes-added-to-woody-cluster/#comments</comments>
		<pubDate>Sat, 17 Dec 2011 10:49:52 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[HPC-Cluster@RRZE]]></category>
		<category><![CDATA[Woody-Cluster]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5609</guid>
		<description><![CDATA[<p>Recently, 40 additional nodes with an aggregated AVX-Linpack performance of 4 TFlop/s have been added to RRZE's Woody cluster. The nodes were bought by RRZE and ECAP and shall provide additional resources especially for sequential and single-node throughput calculations. Each node has a single-socket socket with Intel's latest "SandyBridge" 4-core CPUs (Xeon E3-1200 series), 8 GB of main memory, currently no harddisk (and thus no swap) and GBit ethernet.</p><p></p><p></p><p><strong>Current status:</strong> most of the new nodes are available for general batch processing; the configuration and software environment stabilized</p><p></p><p></p><p><strong>Open problems:</strong></p><p></p><p>no known ones</p><p></p><p></p><p></p><p><strong>User visible changes and solved problems:</strong></p><p></p><p>        End of April 2012: all the new w10xx nodes got their harddisk in the meantime and have been reinstalled with SLES10 to match the old w0xx nodes</p><p>	The module command was not available in PBS batch jobs; <strong>fixed</strong> since 2011-12-17 by patching /etc/profile to always source bashrc even in non-interactive shells</p><p>	The environment variable $HOSTNAME was not defined. <strong>Fixed</strong> since 2011-12-19 via csh.cshrc.local and bash.bashrc.local.</p><p>	SMT disabled on all nodes (since 2011-12-19). All visible cores are physical cores.</p><p>        qsub is now generally wrapped - but that that should be completely transparent for users (2012-01-16).</p><p>	/wsfs = $FASTTMP is now available, too (2012-01-23)</p><p></p><p></p><p></p><p><strong>Configuration notes:</strong></p><p></p><p>  The additional nodes are named w10xx.</p><p>  The base operating system is Ubuntu 10.04 LTS. SuSE SLES10 as on the rest of Woody.</p><p>  </p><p>    The diskless images provisioned using Perceus. Autoinstall + cfengine</p><p>    This is different to the rest of Woody which has stateful SuSE SLES10SP4.</p><p>    However, Tiny* for example also uses Ubuntu 10.04 (but in a stateful installation) and binaries should run on SLES and Ubuntu without recompilation.</p><p>  </p><p>  The w10xx nodes have python-2.6 while the other w0xxx nodes have python-2.4. You can load the python/2.7.1 module to ensure a common Python environment.</p><p>  compilation of C++ code on the compute nodes using one of RRZE's gcc modules will probably fail; however, we never guaranteed that compiling on any compute nodes works; either use the system g++, compile on the frontend nodes, or ...</p><p>  The PBS daemon (pbs_mom) running on the additional nodes is much newer than on the old Woody nodes (2.5.9 v.s. 2.3.x?); but the difference should not be visible for users.</p><p>  Each PBS job runs in a <em>cpuset</em>. Thus, you only have access to the CPUs assigned to you by the queuing system. Memory, however, is not partitioned. Thus, make sure that you only use less than 2 GB per requested core as memory constraints cannot be imposed.</p><p>  As the w10xx nodes currently do not have any local harddisk, they are also operated without swap. Thus, the virtual address space and the physically allocated memory of all processes must not exceed 7.2 GB in total. Also /tmp and /scratch are part of the main memory. Stdout and stderr of PBS jobs are also first spooled to main memory before they are copied to the final destination after the job ended.</p><p>  multi-node jobs are not supported as the nodes are a throughput component</p><p></p><p></p><p></p><p><strong>Queue configuration / how to submit jobs:</strong></p><p></p><p>The old w0xx nodes got the properties :c2 (as they are Intel Core2-based) and :any.The addition w10xx nodes got the properties :sb (as they are Intel SandyBridge-based) and :any.</p><p><strong>Multi-node</strong> jobs (-lnodes=X:ppn=4 or -lnodes=X:ppn=4:c2 with X&gt;1) are only eligible for the old w0xx nodes. :c2 will be added automatically if not present.Multi-node jobs which ask for :sb or :any are rejected.</p><p><strong>Single-node</strong> jobs (-lnodes=1:ppn=4) by default also will only access the old w0xx nodes, i.e. :c2 will be added automatically if no node property is given. Thus, -lnodes=1:ppn=4 is identical to requesting -lnodes=1:ppn=4:c2.</p><p>Single-node jobs which specify :sb (i.e. -lnodes=1:ppn=4:sb) will only go to the new w10xx nodes.</p><p>Jobs with :any (i.e. -lnodes=1:ppn=4:any) will run on any available node.</p><p><strong>Single-core</strong> jobs (-lnodes=1:ppn=Y:sb with Y&lt;4, i.e. requesting less than a complete node) are only supported on the new w10xx nodes. Specifying :sb is mandatory.</p><p></p><p></p><p></p><p><strong>Technical details:</strong></p><p></p><p>PBS routing originally did not work as expected for jobs where the resource requests are given on the command line (e.g. qsub -lnodes=1:ppn=4 job.sh caused trouble).</p><p><em>Some technical background: (1) the torque-submitfilter cannot modify the resource requests given on the command line and (2) routing queues cannot add node properties to resource requests any more, thus, for this type of job routing to the old nodes does not seem to be possible ... Using distinct queues for the old and new nodes has the disadvantage that jobs cannot ask for "any available CPU". Moreover, the maui scheduler does not support multi-dimensional throttling policies, i.e. has problems if one user submits jobs to different queues at the same time.</em></p><p><em>The solution probably is a wrapper around qsub as suggested in the <a href="http://www.supercluster.org/pipermail/torqueusers/2008-May/007365.html">Torque mailinglist back in May 2008</a>. At RRZE we already use qsub-wrappers for e.g. qsub.tinyblue. Duplicating some of the logic of the submit filter into the submit wrapper is not really elegant but seems to be the only solution right now. (As a side node: interactive jobs do not seem to suffer from the problem as there is special handling in the qsub source code which writes the command line arguments to a temporary file which is subject to processing by the submit filter.)</em> </p><p></p>]]></description>
				<content:encoded><![CDATA[<p>Recently, 40 additional nodes with an aggregated AVX-Linpack performance of 4 TFlop/s have been added to RRZE&#8217;s Woody cluster. The nodes were bought by RRZE and ECAP and shall provide additional resources especially for sequential and single-node throughput calculations. Each node has a single-socket socket with Intel&#8217;s latest &#8220;SandyBridge&#8221; 4-core CPUs (Xeon E3-1200 series), 8 GB of main memory, currently no harddisk (and thus no swap) and GBit ethernet.</p>
<p><strong>Current status:</strong> most of the new nodes are available for general batch processing; the configuration and software environment stabilized</p>
<p><strong>Open problems:</strong></p>
<ul>
<li>no known ones</li>
</ul>
<p><strong>User visible changes and solved problems:</strong></p>
<ul>
<li>End of April 2012: all the new w10xx nodes got their harddisk in the meantime and have been reinstalled with SLES10 to match the old w0xx nodes</li>
<li>The <code>module</code> command was not available in PBS batch jobs; <strong>fixed</strong> since 2011-12-17 by patching <code>/etc/profile</code> to always source <code>bashrc</code> even in non-interactive shells</li>
<li>The environment variable <code>$HOSTNAME</code> was not defined. <strong>Fixed</strong> since 2011-12-19 via csh.cshrc.local and bash.bashrc.local.</li>
<li>SMT disabled on all nodes (since 2011-12-19). All visible cores are physical cores.</li>
<li><code>qsub</code> is now generally wrapped &#8211; but that that should be completely transparent for users (2012-01-16).</li>
<li><code>/wsfs</code> = <code>$FASTTMP</code> is now available, too (2012-01-23)</li>
</ul>
<p><strong>Configuration notes:</strong></p>
<ul>
<li>The additional nodes are named <code>w10xx</code>.</li>
<li>The base operating system is <s>Ubuntu 10.04 LTS.</s> SuSE SLES10 as on the rest of Woody.
<ul>
<li><s>The diskless images provisioned using Perceus.</s> Autoinstall + cfengine</li>
<li><s>This is different to the rest of Woody which has stateful SuSE SLES10SP4.</s></li>
<li><s>However, Tiny* for example also uses Ubuntu 10.04 (but in a stateful installation) and binaries should run on SLES and Ubuntu without recompilation.</s></li>
</ul>
</li>
<li><s>The w10xx nodes have python-2.6 while the other w0xxx nodes have python-2.4. You can load the python/2.7.1 module to ensure a common Python environment.</s></li>
<li><s>compilation of C++ code on the compute nodes using one of RRZE&#8217;s gcc modules will probably fail; however, we never guaranteed that compiling on any compute nodes works; either use the system g++, compile on the frontend nodes, or &#8230;</s></li>
<li>The PBS daemon (<code>pbs_mom</code>) running on the additional nodes is much newer than on the old Woody nodes (2.5.9 v.s. 2.3.x?); but the difference should not be visible for users.</li>
<li>Each PBS job runs in a <em>cpuset</em>. Thus, you only have access to the CPUs assigned to you by the queuing system. Memory, however, is not partitioned. Thus, make sure that you only use less than 2 GB per requested core as memory constraints cannot be imposed.</li>
<li><s>As the w10xx nodes currently do not have any local harddisk, they are also operated without swap. Thus, the virtual address space and the physically allocated memory of all processes must not exceed 7.2 GB in total. Also /tmp and /scratch are part of the main memory. Stdout and stderr of PBS jobs are also first spooled to main memory before they are copied to the final destination after the job ended.</s></li>
<li>multi-node jobs are not supported as the nodes are a throughput component</li>
</ul>
<p><strong>Queue configuration / how to submit jobs:</strong></p>
<ul>
<li>The old w0xx nodes got the properties <code>:c2</code> (as they are Intel Core2-based) and <code>:any</code>.<br />The addition w10xx nodes got the properties <code>:sb</code> (as they are Intel SandyBridge-based) and <code>:any</code>.
<li><strong>Multi-node</strong> jobs (<code>-lnodes=X:ppn=4</code> or <code>-lnodes=X:ppn=4:c2</code> with X&gt;1) are only eligible for the old w0xx nodes. <code>:c2</code> will be added automatically if not present.<br />Multi-node jobs which ask for <code>:sb</code> or <code>:any</code> are rejected.</li>
<li><strong>Single-node</strong> jobs (<code>-lnodes=1:ppn=4</code>) by default also will only access the old w0xx nodes, i.e. <code>:c2</code> will be added automatically if no node property is given. Thus, <code>-lnodes=1:ppn=4</code> is identical to requesting <code>-lnodes=1:ppn=4:c2</code>.<br />
Single-node jobs which specify <code>:sb</code> (i.e. <code>-lnodes=1:ppn=4:sb</code>) will only go to the new w10xx nodes.<br />
Jobs with <code>:any</code> (i.e. <code>-lnodes=1:ppn=4:any</code>) will run on any available node.</li>
<li><strong>Single-<u>core</u></strong> jobs (<code>-lnodes=1:ppn=Y:sb</code> with Y&lt;4, i.e. requesting less than a complete node) are only supported on the new w10xx nodes. Specifying <code>:sb</code> is mandatory.</li>
</ul>
<p><strong>Technical details:</strong></p>
<ul>
<li>PBS routing originally did not work as expected for jobs where the resource requests are given on the command line (e.g. <code>qsub -lnodes=1:ppn=4 job.sh</code> caused trouble).<br />
<em>Some technical background: (1) the torque-submitfilter cannot modify the resource requests given on the command line and (2) routing queues cannot add node properties to resource requests any more, thus, for this type of job routing to the old nodes does not seem to be possible &#8230; Using distinct queues for the old and new nodes has the disadvantage that jobs cannot ask for &#8220;any available CPU&#8221;. Moreover, the maui scheduler does not support multi-dimensional throttling policies, i.e. has problems if one user submits jobs to different queues at the same time.</em><br />
<em>The solution probably is a wrapper around <code>qsub</code> as suggested in the <a href="http://www.supercluster.org/pipermail/torqueusers/2008-May/007365.html">Torque mailinglist back in May 2008</a>. At RRZE we already use qsub-wrappers for e.g. <code>qsub.tinyblue</code>. Duplicating some of the logic of the submit filter into the submit wrapper is not really elegant but seems to be the only solution right now. (As a side node: interactive jobs do not seem to suffer from the problem as there is special handling in the qsub source code which writes the command line arguments to a temporary file which is subject to processing by the submit filter.)</em></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2011/12/17/additional-throughput-nodes-added-to-woody-cluster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>STAR-CCM+ fails with &#8220;mpid: Not enough shared memory&#8221;</title>
		<link>http://blogs.fau.de/zeiser/2011/07/22/star-ccm-fails-with-mpid-not-enough-shared-memory/</link>
		<comments>http://blogs.fau.de/zeiser/2011/07/22/star-ccm-fails-with-mpid-not-enough-shared-memory/#comments</comments>
		<pubDate>Fri, 22 Jul 2011 12:34:54 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[STAR-CD/STAR-CCM+]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5605</guid>
		<description><![CDATA[<p>If STAR-CCM+ fails on large shared memory nodes with the message "mpid: Not enough shared memory", your sysadmin might need to increase the kernel limits for SHMMAX (<em>maximum size of shared memory segment in bytes</em>), i.e. sysctl  -w kernel.shmmax=.... Especially, the Ubuntu/Debian default of 32 MB seems to be too small even for 2-socket nodes with 8-core AMD Opteron processors, i.e. 16 cores/node ...</p>]]></description>
				<content:encoded><![CDATA[<p>If STAR-CCM+ fails on large shared memory nodes with the message &#8220;mpid: Not enough shared memory&#8221;, your sysadmin might need to increase the kernel limits for SHMMAX (<em>maximum size of shared memory segment in bytes</em>), i.e. <code>sysctl  -w kernel.shmmax=...</code>. Especially, the Ubuntu/Debian default of 32 MB seems to be too small even for 2-socket nodes with 8-core AMD Opteron processors, i.e. 16 cores/node &#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2011/07/22/star-ccm-fails-with-mpid-not-enough-shared-memory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Selecting a GPU with CUDA_VISIBLE_DEVICES</title>
		<link>http://blogs.fau.de/zeiser/2011/02/09/selecting-a-gpu-with-cuda_visible_devices/</link>
		<comments>http://blogs.fau.de/zeiser/2011/02/09/selecting-a-gpu-with-cuda_visible_devices/#comments</comments>
		<pubDate>Wed, 09 Feb 2011 09:59:40 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[HPC-Cluster@RRZE]]></category>
		<category><![CDATA[TinyGPU]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5599</guid>
		<description><![CDATA[<p>The environment variable CUDA_VISIBLE_DEVICES lists which devices are visible as a comma-separated string. E.g. export CUDA_VISIBLE_DEVICES=0,3 will select an C2070 and C2050 on the <em>tg010</em> compute node of TinyGPU.</p>]]></description>
				<content:encoded><![CDATA[<p>The environment variable <code>CUDA_VISIBLE_DEVICES</code> lists which devices are visible as a comma-separated string. E.g. <code>export CUDA_VISIBLE_DEVICES=0,3</code> will select an C2070 and C2050 on the <em>tg010</em> compute node of TinyGPU.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2011/02/09/selecting-a-gpu-with-cuda_visible_devices/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Getting started on LiMa</title>
		<link>http://blogs.fau.de/zeiser/2010/10/26/getting-started-on-lima/</link>
		<comments>http://blogs.fau.de/zeiser/2010/10/26/getting-started-on-lima/#comments</comments>
		<pubDate>Tue, 26 Oct 2010 16:00:44 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[HPC-Cluster@RRZE]]></category>
		<category><![CDATA[LiMa]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5548</guid>
		<description><![CDATA[<p>Early friendly user access was enabled on LiMa end of October 2010. The system and user environment is still in progress. Here are a few notes ("FAQs") describing specialties of the present configuration and major changes during the early days of operation ...</p><p></p><p></p><p><h3>What are the hardware specifications</h3></p><p></p><p>2 login nodes (lima1/lima2)</p><p></p><p>two-sockets with Intel Westmere X5650 (2.66 GHz) processors; 12 physical cores, 24 logical cores per node</p><p>48 GB DDR3 memory (plus 48 GB swap)</p><p>500 GB in /scratch on a local harddisk</p><p></p><p>500 compute nodes (Lrrnn)</p><p></p><p>two-sockets with Intel Westmere X5650 (2.66 GHz) processors; 12 physical cores, 24 logical cores per node</p><p>24 GB DDR3 memory (NO swap) - roughly 22 GB available for user applications as the rest is used for the OS in the diskless operation</p><p>NO local harddisk</p><p>QDR Infiniband</p><p></p><p>parallel filesystem (/lxfs)</p><p></p><p>ca. 100 TB capacity</p><p>up to 3 GB/s of aggregated bandwidth</p><p></p><p>OS: CentOS 5.5</p><p>batch system: torque/maui (as also on the other RRZE systems)</p><p></p><p></p><p><h3>Where / how should I login?</h3></p><p></p><p>SSH to lima.rrze.uni-erlangen.de and you will end up on one of the two login nodes. As usual, these login nodes are only assessable from within the University network. A VPN-split-tunnel might not be enough to be on the University network as some of the Universities' priviate IP addresses as e.g. used by the HPC systems are not added to the default list of networks routed through the split tunnel. In case of problems, log into cshpc.rrze.uni-erlangen.de first.</p><p></p><p><h3>I have some specific problems on LiMa</h3></p><p></p><p>First of all check if the issue is already described in the online version of the article. If not, contact hpc-support@rrze with as much information as possible.</p><p></p><p><h3>I'd like to contribute some documentation</h3></p><p></p><p>Please add a comment to this article (Log into the Blog system using your IDM account/password which is not not necessarily identical to your HPC account. All FAU students and staff should have an IDM account) or send an email to hpc@rrze.</p><p></p><p>There are almost no <em>modules</em> visible (Update 2010-11-25)</p><p></p><p>For now, please execute source /apps/rrze/etc/user-rrze-modules.csh (for csh/tcsh) or . /apps/rrze/etc/use-rrze-modules.sh (for bash) to initialize the RRZE environment. This command will also set some environment variables like WOODYHOME or FASTTMP.</p><p></p><p>Once the system goes into regular operation, this step will no longer be required.</p><p></p><p>2010-11-25: The login and compute nodes nodes now already have the full user environment by default.</p><p></p><p>How should I submit my jobs</p><p></p><p>Always use ppn=24 when requesting nodes as each node has 2x6 physical cores but 24 logical cores due to SMT.</p><p></p><p>$FASTTMP is empty - where are my files from the parallel filesystem on Woody?</p><p></p><p>The parallel filesystem on Woody and LiMa are different and not shared. However, $FASTTMP is used on both systems to point to the local parallel filesystem.</p><p></p><p>How can I detect in my login scripts whether I'm on Woody or on LiMa</p><p></p><p>There are many different ways to detect this; one option is to test for /wsfs/$GROUP/$USER (Woody or some of the Transtec nodes) and /lxfs/$GROUP/$USER (LiMa).</p><p></p><p>In the future, we might provide an environment variable telling you the cluster (Transtec, Woody, TinyXY, LiMa).</p><p></p><p>Should I recompile my binaries for LiMa?</p><p></p><p>Many old binaries will run on LiMa, too. However, we recommend to recompile on the LiMa frontends as many of the tools and libraries are newer on LiMa in their default version as on Woody.</p><p></p><p>How do I start MPI jobs on LiMa? (Update: 2010-11-03; 2010-12-18)</p><p></p><p>First of all, correct placement ("pinning") of processes is much more important on LiMa (and also TinyXY) than on Woody (or the Transtec cluster) as all modern nodes as ccNUMA and you only achieve best performance if data access is into the local memory. Attend an HPC course if you do not know what ccNUMA is!</p><p></p><p>Do not use the mpirun in the default $PATH if no MPI module is loaded. <em>This hopefully will change when regular user operation starts.</em> There is no mpirun in the default $PATH (unless you have the <em>openmpi</em> moule  loadeed).</p><p>For Intel-MPI to use an start mechanism more or less compatible to the other RRZE clusters use /apps/rrze/bin/mpirun_rrze-intelmpd -intelmpd -pin 0_1_2_3_4_5_6_7_8_9_10_11 .... In this way, you can explicitly pin all you processes as on the other RRZE clusters. However, this algorithm does not scale up to the highest process counts.</p><p>An other option (currently only available on LiMa) is to use one of the <em>official</em> mechanisms of Intel MPI (assuming use use bash for your job script <em>and</em> intempi/4.0.0.028-[intel|gnu] is loaded):</p><p>export PPN=12</p><p>gen-hostlist.sh $PPN</p><p>mpiexec.hydra -print-rank-map -f nodes.$PBS_JOBID [-genv I_MPI_PIN] -n XX ./a.out</p><p>Attention: pinning does not work properly in all circumstances for this start method. See chapter 3.2 of /apps/intel/mpi/4.0.0.028/doc/Reference_Manual.pdf for more details on I_MPI_PIN and friends.</p><p></p><p>An other option (currently only available on LiMa) is to use one of the <em>official</em> mechanisms of Intel MPI (assuming use use bash for your job script <em>and</em> intempi/4.0.1.007-[intel|gnu] is loaded):</p><p>export PPN=12</p><p>export NODES=`uniq $PBS_NODEFILE | wc -l`</p><p>export I_MPI_PIN=enable</p><p>mpiexec.hydra -rmk pbs -ppn $PPN -n $(( $PPN * $NODES )) -print-rank-map ./a.out</p><p>Attention: pinning does not work properly in all circumstances for this start method. See chapter 3.2 of /apps/intel/mpi/4.0.1.0007/doc/Reference_Manual.pdf for more details on I_MPI_PIN and friends.</p><p></p><p>There are of course many more possibilities to start MPI programs ...</p><p></p><p></p><p>Hints for Open MPI (2010-12-18)</p><p></p><p>Starting from today, the <em>openmpi</em> modules on LiMa set OMPI_MCA_orte_tmpdir_base=/dev/shm as $TEMPDIR points to a directory on the LXFS parallel filesystem and, thus, Open MPI might/would show bad performance for shared-memory communication.</p><p></p><p>Pinning can be achieved for Open MPI using mpirun -npernode $ppn -bind-to-core -bycore -n $(( $ppn * $nodes )) ./a.out</p><p> </p><p>PBS output files already visible while the job is running (Update: 2010-11-04; 2010-11-25)</p><p></p><p>As the compute nodes run without any local harddisk (yes, there is only RAM and nothing else locally on the compute nodes to store things), we are experimenting with a special PBS/MOM setting which writes the PBS output files (*.[o|e]$PBS_JOBID or what you specified using PBS -[o|e]) directly to the final destination. Please do not rename/delete these files while the job is running and do not be surprised that you see the files while the job is running.</p><p></p><p>The special settings are: $spool_as_final_name and $nospool_dir_list in the mom_config. I'm not sure if we will keep these settings in the final configuration. They save space in the RAM of the compute node but there are also administrative disadvantages ...</p><p></p><p>2010-11-25: do not use #PBS -o filename or #PBS -e filename as PBS may cause trouble if the file already exists. Without the -o/-e PBS will generate files based on the script name or #PBS -N name and append .[o|e]$PBS_JOBID.</p><p></p><p></p><p></p><p><em>You have to login into the Blog system using your IDM account to be able to vote!</em></p><p></p><p>Where should I store intermediate files?</p><p></p><p>The best is to avoid intermediate files or small but frequent file IO. There is no local harddisk. /tmp is part of the main memory! Please consult hpc-support@rrze to assist analyzing your IO requirements.</p><p></p><p>Large files which are read/written in large blocks should be put to $FASTTMP. Remember: as on Woody there is no backup on $FASTTMP. There are currently also no quotas - but we will probably implement high-water-mark deletion as on Woody.</p><p></p><p>Small files probably should be put on $WOODYHOME.</p><p></p><p>File you want to keep for long time should be moved to /home/vault/$GROUP/$USER. The data access rate to /home/vault currently is limited on LiMa. Please use with care.</p><p></p><p>/tmp, $TMPDIR and $SCRATCH (2010-11-20 / update 2010-11-25)</p><p></p><p>As the nodes are diskless, /tmpis part of a ramdisk and does not provide much temporary scratch space. As of 2010-11-20, a new environment variable $SCRATCH is defined by use-rrze-module.csh/sh which points to a node-specific directory on the parallel filesystem. The PBS configuration is not yet adapted, i.e. $TMPDIR within a job still points to a job specific directory within the tiny /tmp directory.</p><p></p><p>2010-11-25: /scratch ($SCATCH) is a node-specific directory on LXFS. (At least once the compute nodes are rebooted.)</p><p></p><p>2010-11-25: $TMPDIR now points to a job-private directory within $SCRATCH, i.e. is on LXFS; all data in $TMPDIR will be deleted at the end of the job. (At least once the compute nodes are rebooted.)</p><p></p><p>/tmp is still small and part of the node's RAM.</p><p></p><p>mpirun from my commercial code aborts during startup with connection refused</p><p></p><p>On the LiMa nodes there are currently <em>true</em> rsh binaries installed. Make sure that the MPI implementation does not try to start remote processes using rsh as there are no RSH daemons running for security reasons as RSH is not installed and there are also no symlinks from rsh to ssh as on the other RRZE systems. Enforce the usage of SSH. <em>The rsh binaries probably will be uninstalled before regular user operation and replaced by links to the ssh binary (as on most of the other RRZE clusters).</em></p><p></p><p>Update 2010-11-05: rsh and rsh-server have been uninstalled. There are however no links from rsh to ssh.</p><p></p><p>There is obviously some software installed in /opt (e.g. /opt/intel/ and /opt/openmpi)</p><p></p><p>Do not use any software from /opt. All these installations will be removed before regular user operation starts. RRZE-rpvoded software is in /apps and in almost all cases accessible through <em>modules</em>. /apps is not (and will not be) shared between LiMa and Woody.</p><p></p><p>My application tells that it could not allocate the required memory (Update 2010-11-12)</p><p></p><p><em>memory overcommit</em> is limited on LiMa. Thus, not only the resident (i.e. actually) used memory is relevant but also the virtual (i.e. total) memory which sometimes is significantly higher. Complain to the application developer. There is currently no real work around on LiMa.As we are still experimenting with the <em>optimal</em> values of the overcommit limitation, there might be temporal chances (including times when overcommitment is not limited).</p><p></p><p>Memory issues might also come from an inappropriate stacksize line in ~/.cshrc (or ~/.bashrc). Try to remove any stacksize line in your login scripts.</p><p></p><p>IO performance to $FASTTMP (/lxfs/GROUP/USER) seems to be very low (2010-11-03/2010-11-12)</p><p></p><p>The default striping is not optional yet; it uses only one OST, thus, performance is limited by roughly 100 MB/s. Use lfs setstripe --count -1 --size 128m DIRECTORY to manually activate striping over 16 OSTs. 2010-11-25: <em>RRZE will activate reasonable file striping, thus, it should not be necessary for normal users to set striping manually.</em> Modified striping only affects newly created files/subdirectories.</p><p></p><p>The stripe size (--size argument) should match your applications' IO patterns.</p><p></p><p>PBS commands do not work - but they used to work / Jobs are not started (2010-11-03)</p><p></p><p>The PBS server currently has a severe bug: If a jobs requests too much memory and thus crashed the master node of the job, the PBS server stalls for quite long time (several hours) and does not respond at all to any requests (although its running on a different server). This may lead to hanging user commands or error messages like <em>pbs_iff: cannot read reply from pbs_server. No permission</em> or <em>cannot connect to server ladm1</em> or <em>Unauthorized Request</em>. And of course, while the PBS server process hangs, no new jobs are started.</p><p></p><p>If you are interested in the technical details: look at the bugzilla entry at </p><p><a href="http://www.clusterresources.com/bugzilla/show_bug.cgi?id=85">http://www.clusterresources.com/bugzilla/show_bug.cgi?id=85</a></p><p></p><p>mpiexec.hydra and LD_LIBRARY_PATH (2010-11-08)</p><p></p><p>It currently looks like mpiexec.hydra from Intel-MPI 4.0.0.x does not respect LD_LIBRARY_PATH settings. Further investigations are currently carried out. Right now it looks like an issue with the NEC login scripts on the compute nodes which overwrite the LD_LIBRARY_PATH setting.</p><p></p><p>Suggested workaround for now: mpiexec.hydra ...  env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./a.out.</p><p></p><p>Other MPI start mechanisms might be affected, too.</p><p></p><p>2010-11-25: it's not clear if the work around is still required as mpi-selector/Oscar-modules/env-switcher have been uninstalled in the mean time.</p><p></p><p>MPI-IO and Intel MPI 4.0.1.007 (2010-12-09)</p><p></p><p>Probably not only a LiMa issue but also relevant on LiMa. MPI-IO to $FASTTMP with Intel MPI 4.0.1.007 (4.0up1) fails with "File locking failed" unless the following environment variables are set: I_MPI_EXTRA_FILESYSTEM=on  I_MPI_EXTRA_FILESYSTEM_LIST=lustre. Intel MPI up to 4.0.0.028 worked fine without and with these variables set.</p><p></p><p>I_MPI_EXTRA_FILESYSTEM=on  I_MPI_EXTRA_FILESYSTEM_LIST=lustre get set by the intelmpi/4.0.1.007-* module (since 2011-02-02)</p><p></p><p></p><p>STAR-CCM+ (2010-11-12)</p><p></p><p>Here is a (partially tested) sample job script:</p><p></p><p>[bash]</p><p>#!/bin/bash -l</p><p>#PBS -l nodes=4:ppn=24</p><p>#PBS -l walltime=00:25:00</p><p>#PBS -N simXYZ</p><p>#PBS -j eo</p><p></p><p># star-ccm+ arguments</p><p>CCMARGS="-load simxyz.sim"</p><p></p><p># specify the time you want to have to save results, etc.</p><p>TIME4SAVE=1200</p><p></p><p># number of cores to use per node</p><p>PPN=12</p><p></p><p># some MPI options; explicit pinning - must match the PPN line</p><p>MPIRUN_OPTIONS="-cpu_bind=v,map_cpu:0,1,2,3,4,5,6,7,8,9,10,11"</p><p></p><p>### normally, no changes should be required below ###</p><p></p><p>module add star-ccm+/5.06.007</p><p></p><p>echo</p><p></p><p># count the number of nodes</p><p>NODES=`uniq ${PBS_NODEFILE} | wc -l`</p><p># calculate the number of cores actually used</p><p>CORES=$(( ${NODES} * ${PPN} ))</p><p></p><p># check if enough licenses should be available</p><p>/apps/rrze/bin/check_lic.sh -c ${CDLMD_LICENSE_FILE} hpcdomains $(($CORES -1)) ccmpsuite 1</p><p>. /apps/rrze/bin/check_autorequeue.sh</p><p></p><p># change to working directory</p><p>cd ${PBS_O_WORKDIR}</p><p></p><p># generate new node file</p><p>for node in `uniq ${PBS_NODEFILE}`; do</p><p>  echo "${node}:${PPN}"</p><p>done &gt; pbs_nodefile.${PBS_JOBID}</p><p></p><p># some exit/error traps for cleanup</p><p>trap 'echo; echo "*** Signal TERM received: `date`"; echo; rm pbs_nodefile.${PBS_JOBID}; exit' TERM</p><p>trap 'echo; echo "*** Signal KILL received: `date`"; echo; rm pbs_nodefile.${PBS_JOBID}; exit' KILL</p><p></p><p># automatically detect how much time this batch job requested and adjust the</p><p># sleep accordingly</p><p>export TIME4SAVE</p><p>( sleep ` qstat -f ${PBS_JOBID} | awk -v t=${TIME4SAVE}                 \</p><p>    '{if ( $0 ~ /Resource_List.walltime/ )                              \</p><p>        { split($3,duration,":");                                       \</p><p>          print duration[1]*3600+duration[2]*60+duration[3]-t }}' ` &amp;&amp;  \</p><p>  touch ABORT ) &gt;&amp; /dev/null  &amp;</p><p>SLEEP_ID=$!</p><p></p><p># start STAR-CCM+</p><p>starccm+ -batch -rsh ssh -mppflags "$MPIRUN_OPTIONS" -np ${CORES} -machinefile pbs_nodefile.${PBS_JOBID} ${CCMARGS}</p><p></p><p># final clean up</p><p>rm pbs_nodefile.${PBS_JOBID}</p><p>pkill -P ${SLEEP_ID}</p><p>[/bash]</p><p></p><p><h3>Solved issues</h3></p><p></p><p>cmake - (2010-10-26) - rebuilt from sources; should work now without dirty LD_LIBRARY_PATH settings</p><p>svn - (2010-10-26) - installed CentOS rpm on the login nodes; the version unfortunately is a little bit old (1.4.2); however, there is no real chance to get a newer on on LiMa. Goto <em>cshpc</em> to have at least 1.5.0</p><p>qsub from compute nodes - (2010-10-26) - should work now; added allow_node_submit=true to PBS server</p><p>vim - (2010-10-27) - installed CentOS rpm of vim-enhanced on the login nodes</p><p>autologout - (2010-10-27) - will be disabled for CSH once /apps/rrze/etc/use-rrze-modules.csh is sourced</p><p>xmgrace, gnuplot, (xauth) - (2010-11-02) - installed on the login nodes; version from CentOS rpm</p><p>clock skew on /lxfs ($FASTTEMP) is now hopefully really fixed (2010-11-04)</p><p>rsh and rsh-server have been uninstalled from compute/login/admin nodes (2010-11-05)</p><p>several usability improvements added to use-rrze-modules.csh/sh (2010-11-19)</p><p></p>]]></description>
				<content:encoded><![CDATA[<p>Early friendly user access was enabled on LiMa end of October 2010. The system and user environment is still in progress. Here are a few notes (&#8220;FAQs&#8221;) describing specialties of the present configuration and major changes during the early days of operation &#8230;<br />
<h3>What are the hardware specifications</h3>
<ul>
<li>2 login nodes (lima1/lima2)
<ul>
<li>two-sockets with Intel Westmere X5650 (2.66 GHz) processors; 12 physical cores, 24 logical cores per node</li>
<li>48 GB DDR3 memory (plus 48 GB swap)</li>
<li>500 GB in </code>/scratch</code> on a local harddisk</li>
</ul>
</li>
<li>500 compute nodes (Lrrnn)
<ul>
<li>two-sockets with Intel Westmere X5650 (2.66 GHz) processors; 12 physical cores, 24 logical cores per node</li>
<li>24 GB DDR3 memory (NO swap) - roughly 22 GB available for user applications as the rest is used for the OS in the diskless operation</li>
<li>NO local harddisk</li>
<li>QDR Infiniband</li>
</ul>
</li>
<li>parallel filesystem (/lxfs)
<ul>
<li>ca. 100 TB capacity</li>
<li>up to 3 GB/s of aggregated bandwidth</li>
</ul>
</li>
<li>OS: CentOS 5.5</li>
<li>batch system: torque/maui (as also on the other RRZE systems)</li>
</ul>
<h3>Where / how should I login?</h3>
<p>SSH to <code>lima.rrze.uni-erlangen.de</code> and you will end up on one of the two login nodes. As usual, these login nodes are only assessable from within the University network. A VPN-split-tunnel might not be enough to be on the University network as some of the Universities' priviate IP addresses as e.g. used by the HPC systems are not added to the default list of networks routed through the split tunnel. In case of problems, log into <code>cshpc.rrze.uni-erlangen.de</code> first.</p>
<h3>I have some specific problems on LiMa</h3>
<p>First of all check if the issue is already described in the online version of the article. If not, contact hpc-support@rrze with as much information as possible.</p>
<h3>I'd like to contribute some documentation</h3>
<p>Please add a comment to this article (Log into the Blog system using your IDM account/password which is not not necessarily identical to your HPC account. All FAU students and staff should have an IDM account) or send an email to hpc@rrze.</p>
<h4>There are almost no <em>modules</em> visible (Update 2010-11-25)</h4>
<p><s>For now, please execute <code>source /apps/rrze/etc/user-rrze-modules.csh</code> (for csh/tcsh) or <code>. /apps/rrze/etc/use-rrze-modules.sh</code> (for bash) to initialize the RRZE environment. This command will also set some environment variables like <code>WOODYHOME</code> or <code>FASTTMP</code>.</s></p>
<p><s>Once the system goes into regular operation, this step will no longer be required.</s></p>
<p><b>2010-11-25:</b> The login and compute nodes nodes now already have the full user environment by default.</p>
<h4>How should I submit my jobs</h4>
<p>Always use <code>ppn=24</code> when requesting nodes as each node has 2x6 physical cores but 24 logical cores due to SMT.</p>
<h4>$FASTTMP is empty - where are my files from the parallel filesystem on Woody?</h4>
<p>The parallel filesystem on Woody and LiMa are different and not shared. However, <code>$FASTTMP</code> is used on both systems to point to the local parallel filesystem.</p>
<h4>How can I detect in my login scripts whether I'm on Woody or on LiMa</h4>
<p>There are many different ways to detect this; one option is to test for <code>/wsfs/$GROUP/$USER</code> (Woody or some of the Transtec nodes) and <code>/lxfs/$GROUP/$USER</code> (LiMa).</p>
<p>In the future, we might provide an environment variable telling you the cluster (Transtec, Woody, TinyXY, LiMa).</p>
<h4>Should I recompile my binaries for LiMa?</h4>
<p>Many old binaries will run on LiMa, too. However, we recommend to recompile on the LiMa frontends as many of the tools and libraries are newer on LiMa in their default version as on Woody.</p>
<h4>How do I start MPI jobs on LiMa? (Update: 2010-11-03; 2010-12-18)</h4>
<p>First of all, correct placement ("pinning") of processes is much more important on LiMa (and also TinyXY) than on Woody (or the Transtec cluster) as all modern nodes as ccNUMA and you only achieve best performance if data access is into the local memory. Attend an HPC course if you do not know what ccNUMA is!</p>
<ol>
<li><s>Do not use the <code>mpirun</code> in the default $PATH if no MPI module is loaded. <em>This hopefully will change when regular user operation starts.</em></s> There is no <code>mpirun</code> in the default $PATH (unless you have the <em>openmpi</em> moule  loadeed).</li>
<li>For Intel-MPI to use an start mechanism more or less compatible to the other RRZE clusters use <code>/apps/rrze/bin/mpirun_rrze-intelmpd -intelmpd -pin 0_1_2_3_4_5_6_7_8_9_10_11 ...</code>. In this way, you can explicitly pin all you processes as on the other RRZE clusters. However, this algorithm does not scale up to the highest process counts.</li>
<li><s>An other option (currently only available on LiMa) is to use one of the <em>official</em> mechanisms of Intel MPI (assuming use use <code>bash</code> for your job script <em>and</em> <code>intempi/4.0.0.028-[intel|gnu]</code> is loaded):<br />
<code>export PPN=12</code><br />
<code>gen-hostlist.sh $PPN</code><br />
<code>mpiexec.hydra -print-rank-map -f nodes.$PBS_JOBID [-genv I_MPI_PIN] -n XX ./a.out</code><br />
Attention: pinning does not work properly in all circumstances for this start method. See chapter 3.2 of <code>/apps/intel/mpi/4.0.0.028/doc/Reference_Manual.pdf</code> for more details on <code>I_MPI_PIN</code> and friends.</s>
</li>
<li>An other option (currently only available on LiMa) is to use one of the <em>official</em> mechanisms of Intel MPI (assuming use use <code>bash</code> for your job script <em>and</em> <code>intempi/4.0.1.007-[intel|gnu]</code> is loaded):<br />
<code>export PPN=12</code><br />
<code>export NODES=`uniq $PBS_NODEFILE | wc -l`</code><br />
<code>export I_MPI_PIN=enable</code><br />
<code>mpiexec.hydra -rmk pbs -ppn $PPN -n $(( $PPN * $NODES )) -print-rank-map ./a.out</code><br />
Attention: pinning does not work properly in all circumstances for this start method. See chapter 3.2 of <code>/apps/intel/mpi/4.0.1.0007/doc/Reference_Manual.pdf</code> for more details on <code>I_MPI_PIN</code> and friends.
</li>
<li>There are of course many more possibilities to start MPI programs ...</li>
</ol>
<h4>Hints for Open MPI (2010-12-18)</h4>
<p>Starting from today, the <em>openmpi</em> modules on LiMa set <code>OMPI_MCA_orte_tmpdir_base=/dev/shm</code> as $TEMPDIR points to a directory on the LXFS parallel filesystem and, thus, Open MPI might/would show bad performance for shared-memory communication.</p>
<p>Pinning can be achieved for Open MPI using <code>mpirun -npernode $ppn -bind-to-core -bycore -n $(( $ppn * $nodes )) ./a.out</code></p>
<h4>PBS output files already visible while the job is running (Update: 2010-11-04; 2010-11-25)</h4>
<p>As the compute nodes run without any local harddisk (yes, there is only RAM and nothing else locally on the compute nodes to store things), we are experimenting with a special PBS/MOM setting which writes the PBS output files (<code>*.[o|e]$PBS_JOBID</code> or what you specified using <code>PBS -[o|e]</code>) directly to the final destination. Please do not rename/delete these files while the job is running and do not be surprised that you see the files while the job is running.</p>
<p>The special settings are: <code>$spool_as_final_name</code> and <code>$nospool_dir_list</code> in the mom_config. I'm not sure if we will keep these settings in the final configuration. They save space in the RAM of the compute node but there are also administrative disadvantages ...</p>
<p><b>2010-11-25:</b> do not use <code>#PBS -o filename</code> or <code>#PBS -e filename</code> as PBS may cause trouble if the file already exists. Without the <code>-o/-e</code> PBS will generate files based on the script name or <code>#PBS -N name</code> and append <code>.[o|e]$PBS_JOBID</code>.</p>

        <script type="text/javascript">
        function init() {
            jQuery("#poll-form").submit(function(e) {
                var answered = false;

                jQuery("#poll-form .answer").each(function(i) {
                    if(this.checked) {
                        answered = true;
                        return true;
                    }
                });
                if(!answered) {
                    alert("Bitte wählen Sie eine Antwort");
                    e.preventDefault();
                    e.stopPropagation();
                    return false;
                }
            });
        }
        jQuery(document).ready(init);
        </script>
        <div class="polls-container">
            <form name="post" action="http://blogs.fau.de/zeiser/2010/10/26/getting-started-on-lima/" method="post" class="polls-form" id="polls-form">

            <input type="hidden" name="poll_id" value="1" />
            <div class="polls-box polls-box-102">
                <div class="polls-box-outer-102">
                    <div class="polls-box-inner">
                        <div class="polls-box-top">
                            <div class="polls-question">
                                <div class="polls-question-outer">
                                    <div class="polls-question-inner">
                                        <div class="polls-question-top polls-question-top-102">Do you like the torque $spool_as_final_name and $nospool_dir_list settings on LiMa</div>
                                    </div>
                                </div>
                            </div>
                            <div class="polls-answer polls-answer-102">
                                <span id="polls-answers">
                                    <span class="polls-answer-group">
                                        <span class="polls-answer-input"><input type="checkbox" name="poll_1[]" id="answer-id-1" value="1" class="answer polls-radiobutton"/>
                                        </span>
                                        <label class="polls-input-label" for="answer-id-1">
                                            <span class="polls-answer-span">no, I'm confused by the fact the PBS output files already visible while the job is running</span>
                                        </label>
                                        
                                    </span><br/>
                                    <span class="polls-answer-group">
                                        <span class="polls-answer-input"><input type="checkbox" name="poll_1[]" id="answer-id-2" value="2" class="answer polls-radiobutton"/>
                                        </span>
                                        <label class="polls-input-label" for="answer-id-2">
                                            <span class="polls-answer-span">no, because it's much more difficult to detect if a job already finished or is still running</span>
                                        </label>
                                        
                                    </span><br/>
                                    <span class="polls-answer-group">
                                        <span class="polls-answer-input"><input type="checkbox" name="poll_1[]" id="answer-id-3" value="3" class="answer polls-radiobutton"/>
                                        </span>
                                        <label class="polls-input-label" for="answer-id-3">
                                            <span class="polls-answer-span">yes, because I can seen STDOUT/STDERR messages already while the job is running</span>
                                        </label>
                                        
                                    </span><br/>
                                    <span class="polls-answer-group">
                                        <span class="polls-answer-input"><input type="checkbox" name="poll_1[]" id="answer-id-4" value="4" class="answer polls-radiobutton"/>
                                        </span>
                                        <label class="polls-input-label" for="answer-id-4">
                                            <span class="polls-answer-span">I don't care</span>
                                        </label>
                                        
                                    </span><br/>
                                </span>
                                <span class="polls-clear"></span>
                            </div>
                            <div class="polls-vote">
                                <div class="polls-votebutton-outer">
                                    <span class="polls-links-102"><a class="polls-view-results" href="http://blogs.fau.de/zeiser/2010/10/26/getting-started-on-lima/?poll=1&amp;view=results">Ergebnisse anzeigen</a>
                                    </span>
                                    <span class="polls-clear"></span>
                                </div>
                            </div>
                        </div>
                    </div>
                </div>
            </div>

            </form>
        </div>
        
<p><em>You have to login into the Blog system using your IDM account to be able to vote!</em></p>
<h4>Where should I store intermediate files?</h4>
<p>The best is to avoid intermediate files or small but frequent file IO. There is no local harddisk. <code>/tmp</code> is part of the main memory! Please consult <code>hpc-support@rrze</code> to assist analyzing your IO requirements.</p>
<p>Large files which are read/written in large blocks should be put to <code>$FASTTMP</code>. Remember: as on Woody there is no backup on $FASTTMP. There are currently also no quotas - but we will probably implement high-water-mark deletion as on Woody.</p>
<p>Small files probably should be put on $WOODYHOME.</p>
<p>File you want to keep for long time should be moved to <code>/home/vault/$GROUP/$USER</code>. The data access rate to <code>/home/vault</code> currently is limited on LiMa. Please use with care.</p>
<h4>/tmp, $TMPDIR and $SCRATCH (2010-11-20 / update 2010-11-25)</h4>
<p><s>As the nodes are diskless, <code>/tmp</code>is part of a ramdisk and does not provide much temporary scratch space. As of 2010-11-20, a new environment variable <code>$SCRATCH</code> is defined by <code>use-rrze-module.csh/sh</code> which points to a node-specific directory on the parallel filesystem. The PBS configuration is not yet adapted, i.e. <code>$TMPDIR</code> within a job still points to a job specific directory within the tiny <code>/tmp</code> directory.</s></p>
<p><b>2010-11-25:</b> <code>/scratch</code> (<code>$SCATCH</code>) is a node-specific directory on LXFS. (At least once the compute nodes are rebooted.)</p>
<p><b>2010-11-25:</b> <code>$TMPDIR</code> now points to a job-private directory within $SCRATCH, i.e. is on LXFS; all data in $TMPDIR will be deleted at the end of the job. (At least once the compute nodes are rebooted.)</p>
<p><code>/tmp</code> is still small and part of the node's RAM.</p>
<h4>mpirun from my commercial code aborts during startup with connection refused</h4>
<p><s>On the LiMa nodes there are currently <em>true</em> <code>rsh</code> binaries installed.</s> Make sure that the MPI implementation does not try to start remote processes using <code>rsh</code> <s>as there are no RSH daemons running for security reasons</s> as RSH is not installed and there are also no symlinks from rsh to ssh as on the other RRZE systems. Enforce the usage of SSH. <s><em>The rsh binaries probably will be uninstalled before regular user operation and replaced by links to the ssh binary (as on most of the other RRZE clusters).</em></s></p>
<p>Update 2010-11-05: <code>rsh</code> and <code>rsh-server</code> have been uninstalled. There are however no links from rsh to ssh.</p>
<h4>There is obviously some software installed in /opt (e.g. /opt/intel/ and /opt/openmpi)</h4>
<p>Do not use any software from <code>/opt</code>. All these installations will be removed before regular user operation starts. RRZE-rpvoded software is in <code>/apps</code> and in almost all cases accessible through <em>modules</em>. <code>/apps</code> is not (and will not be) shared between LiMa and Woody.</p>
<h4>My application tells that it could not allocate the required memory (Update 2010-11-12)</h4>
<p><em>memory overcommit</em> is limited on LiMa. Thus, not only the resident (i.e. actually) used memory is relevant but also the virtual (i.e. total) memory which sometimes is significantly higher. Complain to the application developer. There is currently no real work around on LiMa.<br />As we are still experimenting with the <em>optimal</em> values of the overcommit limitation, there might be temporal chances (including times when overcommitment is not limited).</p>
<p>Memory issues might also come from an inappropriate <code>stacksize</code> line in <code>~/.cshrc</code> (or <code>~/.bashrc</code>). Try to remove any stacksize line in your login scripts.</p>
<h4>IO performance to <code>$FASTTMP</code> (<code>/lxfs/GROUP/USER</code>) seems to be very low (2010-11-03/2010-11-12)</h4>
<p><s>The default striping is not optional yet; it uses only one OST, thus, performance is limited by roughly 100 MB/s. Use <code>lfs setstripe --count -1 --size 128m DIRECTORY</code> to manually activate striping over 16 OSTs.</s> <b>2010-11-25:</b> <em>RRZE will activate reasonable file striping, thus, it should not be necessary for normal users to set striping manually.</em> Modified striping only affects newly created files/subdirectories.</p>
<p>The stripe size (<code>--size</code> argument) should match your applications' IO patterns.</p>
<h4>PBS commands do not work - but they used to work / Jobs are not started (2010-11-03)</h4>
<p>The PBS server currently has a severe bug: If a jobs requests too much memory and thus crashed the master node of the job, the PBS server stalls for quite long time (several hours) and does not respond at all to any requests (although its running on a different server). This may lead to hanging user commands or error messages like <em>pbs_iff: cannot read reply from pbs_server. No permission</em> or <em>cannot connect to server ladm1</em> or <em>Unauthorized Request</em>. And of course, while the PBS server process hangs, no new jobs are started.</p>
<p>If you are interested in the technical details: look at the bugzilla entry at<br />
<a href="http://www.clusterresources.com/bugzilla/show_bug.cgi?id=85">http://www.clusterresources.com/bugzilla/show_bug.cgi?id=85</a></p>
<h4>mpiexec.hydra and LD_LIBRARY_PATH (2010-11-08)</h4>
<p>It currently looks like <code>mpiexec.hydra</code> from Intel-MPI 4.0.0.x does not respect <code>LD_LIBRARY_PATH</code> settings. Further investigations are currently carried out. Right now it looks like an issue with the NEC login scripts on the compute nodes which overwrite the LD_LIBRARY_PATH setting.</p>
<p>Suggested workaround for now: <code>mpiexec.hydra ...  env LD_LIBRARY_PATH=$LD_LIBRARY_PATH ./a.out</code>.</p>
<p>Other MPI start mechanisms might be affected, too.</p>
<p><b>2010-11-25:</b> it's not clear if the work around is still required as mpi-selector/Oscar-modules/env-switcher have been uninstalled in the mean time.</p>
<h4>MPI-IO and Intel MPI 4.0.1.007 (2010-12-09)</h4>
<p>Probably not only a LiMa issue but also relevant on LiMa. MPI-IO to $FASTTMP with Intel MPI 4.0.1.007 (4.0up1) fails with "File locking failed" unless the following environment variables are set: <code>I_MPI_EXTRA_FILESYSTEM=on  I_MPI_EXTRA_FILESYSTEM_LIST=lustre</code>. Intel MPI up to 4.0.0.028 worked fine without and with these variables set.</p>
<p><code>I_MPI_EXTRA_FILESYSTEM=on  I_MPI_EXTRA_FILESYSTEM_LIST=lustre</code> get set by the intelmpi/4.0.1.007-* module (since 2011-02-02)</p>
<h4>STAR-CCM+ (2010-11-12)</h4>
<p>Here is a (partially tested) sample job script:
<p>
[bash]<br />
#!/bin/bash -l<br />
#PBS -l nodes=4:ppn=24<br />
#PBS -l walltime=00:25:00<br />
#PBS -N simXYZ<br />
#PBS -j eo</p>
<p># star-ccm+ arguments<br />
CCMARGS="-load simxyz.sim"</p>
<p># specify the time you want to have to save results, etc.<br />
TIME4SAVE=1200</p>
<p># number of cores to use per node<br />
PPN=12</p>
<p># some MPI options; explicit pinning - must match the PPN line<br />
MPIRUN_OPTIONS="-cpu_bind=v,map_cpu:0,1,2,3,4,5,6,7,8,9,10,11"</p>
<p>### normally, no changes should be required below ###</p>
<p>module add star-ccm+/5.06.007</p>
<p>echo</p>
<p># count the number of nodes<br />
NODES=`uniq ${PBS_NODEFILE} | wc -l`<br />
# calculate the number of cores actually used<br />
CORES=$(( ${NODES} * ${PPN} ))</p>
<p># check if enough licenses should be available<br />
/apps/rrze/bin/check_lic.sh -c ${CDLMD_LICENSE_FILE} hpcdomains $(($CORES -1)) ccmpsuite 1<br />
. /apps/rrze/bin/check_autorequeue.sh</p>
<p># change to working directory<br />
cd ${PBS_O_WORKDIR}</p>
<p># generate new node file<br />
for node in `uniq ${PBS_NODEFILE}`; do<br />
  echo "${node}:${PPN}"<br />
done &gt; pbs_nodefile.${PBS_JOBID}</p>
<p># some exit/error traps for cleanup<br />
trap 'echo; echo "*** Signal TERM received: `date`"; echo; rm pbs_nodefile.${PBS_JOBID}; exit' TERM<br />
trap 'echo; echo "*** Signal KILL received: `date`"; echo; rm pbs_nodefile.${PBS_JOBID}; exit' KILL</p>
<p># automatically detect how much time this batch job requested and adjust the<br />
# sleep accordingly<br />
export TIME4SAVE<br />
( sleep ` qstat -f ${PBS_JOBID} | awk -v t=${TIME4SAVE}                 \<br />
    '{if ( $0 ~ /Resource_List.walltime/ )                              \<br />
        { split($3,duration,":");                                       \<br />
          print duration[1]*3600+duration[2]*60+duration[3]-t }}' ` &amp;&amp;  \<br />
  touch ABORT ) &gt;&amp; /dev/null  &amp;<br />
SLEEP_ID=$!</p>
<p># start STAR-CCM+<br />
starccm+ -batch -rsh ssh -mppflags "$MPIRUN_OPTIONS" -np ${CORES} -machinefile pbs_nodefile.${PBS_JOBID} ${CCMARGS}</p>
<p># final clean up<br />
rm pbs_nodefile.${PBS_JOBID}<br />
pkill -P ${SLEEP_ID}<br />
[/bash]</p>
<h3>Solved issues</h3>
<ul>
<li>cmake - (2010-10-26) - rebuilt from sources; should work now without dirty LD_LIBRARY_PATH settings</li>
<li>svn - (2010-10-26) - installed CentOS rpm on the login nodes; the version unfortunately is a little bit old (1.4.2); however, there is no real chance to get a newer on on LiMa. Goto <em>cshpc</em> to have at least 1.5.0</li>
<li>qsub from compute nodes - (2010-10-26) - should work now; added <code>allow_node_submit=true</code> to PBS server</li>
<li>vim - (2010-10-27) - installed CentOS rpm of vim-enhanced on the login nodes</li>
<li>autologout - (2010-10-27) - will be disabled for CSH once <code>/apps/rrze/etc/use-rrze-modules.csh</code> is sourced</li>
<li>xmgrace, gnuplot, (xauth) - (2010-11-02) - installed on the login nodes; version from CentOS rpm</li>
<li>clock skew on /lxfs ($FASTTEMP) is now hopefully really fixed (2010-11-04)</li>
<li><code>rsh</code> and <code>rsh-server</code> have been uninstalled from compute/login/admin nodes (2010-11-05)</li>
<li>several usability improvements added to <code>use-rrze-modules.csh/sh</code> (2010-11-19)</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2010/10/26/getting-started-on-lima/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LiMa, LIKWID und der Turbo Boost</title>
		<link>http://blogs.fau.de/zeiser/2010/10/13/lima-likwid-und-der-turbo-boost/</link>
		<comments>http://blogs.fau.de/zeiser/2010/10/13/lima-likwid-und-der-turbo-boost/#comments</comments>
		<pubDate>Wed, 13 Oct 2010 18:00:26 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[HPC-Cluster@RRZE]]></category>
		<category><![CDATA[LIKWID]]></category>
		<category><![CDATA[LiMa]]></category>
		<category><![CDATA[performance tuning]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5533</guid>
		<description><![CDATA[<p>Intel hat mit den <em>Nehalem</em>-Prozessoren (Xeon X55xx bzw. Core i7) den <em><a href="http://www.intel.com/technology/turboboost/">Turbo Boost</a></em>-Modus eingeführt, bei dem stark vereinfacht gesagt, einzelne Prozessorkerne automatisch höher takten können, wenn nur Teile des gesamten Prozessorchips genutzt werden und somit "Luft" bei Stromverbrauch, Spannung und Temperatur ist. Im LiMa-Cluster sind Intel Westmere Prozessoren mit 2,66 GHz (Intel <a href="http://ark.intel.com/Product.aspx?id=47922">Xeon X5650</a>) verbaut. Diese Prozessoren erlaubt prinzipiell (bis zu) zwei Turbo Boost-Stufen (+2x133 MHz), auch dann wenn alle Kerne benutzt sind, und bis zu drei Turbo Boost-Stufen (+3x133 MHz), wenn maximal zwei der sechs Kerne eines Prozessorchips in Benutzung sind (vgl. Tabelle 2 im <a href="http://www.intel.com/Assets/en_US/PDF/specupdate/323372.pdf">Intel Xeon Processor 5600 Series Specification Update</a> vom September 2010). Das heißt unter günstigen Bedingungen laufen alle Prozessorkerne auch unter Volllast mit 2,93 GHz obwohl man eigentlich nur einen 2,66 GHz-Prozessor gekauft hat.</p><p></p><p></p><p>Dass annähernd zwei volle Turbo Boost-Stufen auch unter Vollast möglich sind, zeigt nebenstehende Grafik. Hierbei wurde mit einer Auflösung von 5 Sekunden die Taktfrequenz aller physikalischen Kerne im Knoten mittels <a href="http://code.google.com/p/likwid/">LIKWID</a> gemessen, während auf dem Knoten die multi-threaded LINPACK-Version aus Intel's MKL lief. Bevor die LINPACK-Prozesse anlaufen, haben die Prozessorkerne aufgrund der <em>ondemand</em>-Frequenzeinstellung im Linux-Betriebssystem heruntergetaktet. Sobald "Last" generiert wird, takten die Prozessoren hoch. Wenn nach etlichen Sekunden die Prozessoren "durchgeheizt" sind, sinkt die Taktfrequenz nur leicht von 2,93 GHz auf rund 2,90 GHz. Am Ende des bzw. kurz nach dem LINPACK-Lauf takten die Prozessoren kurzfristig nochmals hoch, da zum einen die Last geringer geworden ist und somit die thermischen und elektrischen Grenzwerte für den Turbo Boost-Mode unterschritten sind, gleichzeitig der ondemand-Regler des Linux-Betriebssystems die Prozessoren aber noch nicht herunter getaktet hat. In der Grafik sind im wesentlichen nur zwei Kurven zu erkennen, obwohl es eigentlich 12 sind, da alle Kerne eines Sockels praktisch immer mit der gleichen Frequenz laufen.</p><p></p><p></p><p></p><p>Leider laufen jedoch nicht immer alle X5650-Prozessoren unter Volllast mit annähernd zwei Turbo Boost-Stufen, d.h. 2,93 GHz, wie die zweite Grafik zeigt. Hier takten die Rechenkerne über weite Teile des LINPACK-Laufs auf "nur" 2,7 GHz herunter, wodurch die gemessene Knotenleistung von rund 128,5 GFlop/s auf 120,5 GFlop/s sinkt -- über 5% die man sicherlich auch in der einen oder anderen Form bei realen Anwendungen und nur nicht nur beim synthetischen LINPACK sieht.</p><p></p><p>Über die Ursachen der geringeren Übertaktung des zweiten Knotens kann derzeit nur spekuliert werden. Die Wärmeleitpaste zwischen den Prozessoren und den Kühlkörpern ist es jedenfalls nachweislich nicht. Ebenso ist es nicht das Netzteil oder die Position im Rack, da ein Umzug des Rechenknoten in ein anderes Enclosure in einem anderen Rack keine Besserung brachten. BIOS-Version, CMOS-Einstellung und CPU-Stepping sollten hoffentlich bei allen Knoten auch gleich sein. Dass zwei Prozessoren aus Hunderten eine Macke haben, mag ja durchaus sein, aber wie wahrscheinlich ist es, dass genau diese zwei Prozessoren dann auch noch im gleichen Rechner verbaut werden ... Als wahrscheinlichste Ursache würde ich daher im Moment "Toleranzen" bei den Mainboards vermuten, die sich negativ auswirken. Aber Details wird NEC sicherlich im eigenen und unseren Interesse noch herausfinden ...</p><p></p><p>Performance-Messtools wie <a href="http://code.google.com/p/likwid/">LIKWID</a> zahlen sich auf jeden Fall auch für die Abnahme von HPC-Clustern aus.</p><p></p><p>Hier nochmal sinngemäß die Befehle, die ich zur Messung verwendet habe: (LIKWID 2.0 ist dabei aufgrund des Daemon-Modus mindestens nötig und /dev/cpu/*/msr muss durch den aufrufenden User les- und schreibbar sein):</p><p>[shell]</p><p>/opt/likwid/2.0/bin/likwid-perfctr -c 0-11 -g CLOCK -d 5 | tee /tmp/clock-speed-`hostname`-`date +%Y%m%d-%H%M`.log &gt; /dev/null &amp;</p><p>sleep 15</p><p>env OMP_NUM_THREADS=12 taskset -c 0-11 /opt/intel/Compiler/11.1/073/mkl/benchmarks/linpack/xlinpack_xeon64 lininput_xeon64-50k</p><p>sleep 15</p><p>kill $! &gt;&amp; /dev/null</p><p>[/shell]</p><p></p><p>Da die Taktfrequenz eine abgeleitete Größe ist, kann es vorkommen, dass einzelne CPUs <em>nan</em> als Taktfrequenz liefern, wenn keine Instruktionen ausgeführt werden. Aber das ist natürlich nur dann der Fall, wenn auch kein Programm auf dem Prozessorkern läuft.</p><p></p>]]></description>
				<content:encoded><![CDATA[<p>Intel hat mit den <em>Nehalem</em>-Prozessoren (Xeon X55xx bzw. Core i7) den <em><a href="http://www.intel.com/technology/turboboost/">Turbo Boost</a></em>-Modus eingeführt, bei dem stark vereinfacht gesagt, einzelne Prozessorkerne automatisch höher takten können, wenn nur Teile des gesamten Prozessorchips genutzt werden und somit &#8220;Luft&#8221; bei Stromverbrauch, Spannung und Temperatur ist. Im LiMa-Cluster sind Intel Westmere Prozessoren mit 2,66 GHz (Intel <a href="http://ark.intel.com/Product.aspx?id=47922">Xeon X5650</a>) verbaut. Diese Prozessoren erlaubt prinzipiell (bis zu) zwei Turbo Boost-Stufen (+2&#215;133 MHz), auch dann wenn alle Kerne benutzt sind, und bis zu drei Turbo Boost-Stufen (+3&#215;133 MHz), wenn maximal zwei der sechs Kerne eines Prozessorchips in Benutzung sind (vgl. Tabelle 2 im <a href="http://www.intel.com/Assets/en_US/PDF/specupdate/323372.pdf">Intel Xeon Processor 5600 Series Specification Update</a> vom September 2010). Das heißt unter günstigen Bedingungen laufen alle Prozessorkerne auch unter Volllast mit 2,93 GHz obwohl man eigentlich nur einen 2,66 GHz-Prozessor gekauft hat.</p>
<div id="attachment_5535" class="wp-caption alignleft" style="width: 310px"><a href="http://blogs.fau.de/zeiser/files/2010/10/clock-speed-l0943.png"><img src="http://blogs.fau.de/zeiser/files/2010/10/clock-speed-l0943-300x231.png" alt="" width="300" height="231" class="size-medium wp-image-5535" /></a><p class="wp-caption-text">zeitlich aufgelöste Taktfrequenz beim LINPACK-Lauf auf einem <em>guten</em> Knoten</p></div>
<p>Dass annähernd zwei volle Turbo Boost-Stufen auch unter Vollast möglich sind, zeigt nebenstehende Grafik. Hierbei wurde mit einer Auflösung von 5 Sekunden die Taktfrequenz aller physikalischen Kerne im Knoten mittels <a href="http://code.google.com/p/likwid/">LIKWID</a> gemessen, während auf dem Knoten die multi-threaded LINPACK-Version aus Intel&#8217;s MKL lief. Bevor die LINPACK-Prozesse anlaufen, haben die Prozessorkerne aufgrund der <em>ondemand</em>-Frequenzeinstellung im Linux-Betriebssystem heruntergetaktet. Sobald &#8220;Last&#8221; generiert wird, takten die Prozessoren hoch. Wenn nach etlichen Sekunden die Prozessoren &#8220;durchgeheizt&#8221; sind, sinkt die Taktfrequenz nur leicht von 2,93 GHz auf rund 2,90 GHz. Am Ende des bzw. kurz nach dem LINPACK-Lauf takten die Prozessoren kurzfristig nochmals hoch, da zum einen die Last geringer geworden ist und somit die thermischen und elektrischen Grenzwerte für den Turbo Boost-Mode unterschritten sind, gleichzeitig der ondemand-Regler des Linux-Betriebssystems die Prozessoren aber noch nicht herunter getaktet hat. In der Grafik sind im wesentlichen nur zwei Kurven zu erkennen, obwohl es eigentlich 12 sind, da alle Kerne eines Sockels praktisch immer mit der gleichen Frequenz laufen.</p>
<div style="clear:both"></div>
<div id="attachment_5536" class="wp-caption alignright" style="width: 310px"><a href="http://blogs.fau.de/zeiser/files/2010/10/clock-speed-l1342-compare20101012_13.png"><img src="http://blogs.fau.de/zeiser/files/2010/10/clock-speed-l1342-compare20101012_13-300x231.png" alt="" width="300" height="231" class="size-medium wp-image-5536" /></a><p class="wp-caption-text">zeitlich aufgelöste Taktfrequenz beim LINPACK-Lauf auf einem <em>schlechten</em> Knoten</p></div>
<p>Leider laufen jedoch nicht immer alle X5650-Prozessoren unter Volllast mit annähernd zwei Turbo Boost-Stufen, d.h. 2,93 GHz, wie die zweite Grafik zeigt. Hier takten die Rechenkerne über weite Teile des LINPACK-Laufs auf &#8220;nur&#8221; 2,7 GHz herunter, wodurch die gemessene Knotenleistung von rund 128,5 GFlop/s auf 120,5 GFlop/s sinkt &#8212; über 5% die man sicherlich auch in der einen oder anderen Form bei realen Anwendungen und nur nicht nur beim synthetischen LINPACK sieht.</p>
<p>Über die Ursachen der geringeren Übertaktung des zweiten Knotens kann derzeit nur spekuliert werden. Die Wärmeleitpaste zwischen den Prozessoren und den Kühlkörpern ist es jedenfalls nachweislich nicht. Ebenso ist es nicht das Netzteil oder die Position im Rack, da ein Umzug des Rechenknoten in ein anderes Enclosure in einem anderen Rack keine Besserung brachten. BIOS-Version, CMOS-Einstellung und CPU-Stepping sollten hoffentlich bei allen Knoten auch gleich sein. Dass zwei Prozessoren aus Hunderten eine Macke haben, mag ja durchaus sein, aber wie wahrscheinlich ist es, dass genau diese zwei Prozessoren dann auch noch im gleichen Rechner verbaut werden &#8230; Als wahrscheinlichste Ursache würde ich daher im Moment &#8220;Toleranzen&#8221; bei den Mainboards vermuten, die sich negativ auswirken. Aber Details wird NEC sicherlich im eigenen und unseren Interesse noch herausfinden &#8230;</p>
<p>Performance-Messtools wie <a href="http://code.google.com/p/likwid/">LIKWID</a> zahlen sich auf jeden Fall auch für die Abnahme von HPC-Clustern aus.</p>
<p>Hier nochmal sinngemäß die Befehle, die ich zur Messung verwendet habe: (LIKWID 2.0 ist dabei aufgrund des Daemon-Modus mindestens nötig und <code>/dev/cpu/*/msr</code> muss durch den aufrufenden User les- und schreibbar sein):</p>
<p>[shell]<br />
/opt/likwid/2.0/bin/likwid-perfctr -c 0-11 -g CLOCK -d 5 | tee /tmp/clock-speed-`hostname`-`date +%Y%m%d-%H%M`.log &gt; /dev/null &amp;<br />
sleep 15<br />
env OMP_NUM_THREADS=12 taskset -c 0-11 /opt/intel/Compiler/11.1/073/mkl/benchmarks/linpack/xlinpack_xeon64 lininput_xeon64-50k<br />
sleep 15<br />
kill $! &gt;&amp; /dev/null<br />
[/shell]</p>
<p>Da die Taktfrequenz eine abgeleitete Größe ist, kann es vorkommen, dass einzelne CPUs <em>nan</em> als Taktfrequenz liefern, wenn keine Instruktionen ausgeführt werden. Aber das ist natürlich nur dann der Fall, wenn auch kein Programm auf dem Prozessorkern läuft.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2010/10/13/lima-likwid-und-der-turbo-boost/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>LiMa, Kühlschränke und Toster</title>
		<link>http://blogs.fau.de/zeiser/2010/10/05/lima-kuhlschranke-und-toster/</link>
		<comments>http://blogs.fau.de/zeiser/2010/10/05/lima-kuhlschranke-und-toster/#comments</comments>
		<pubDate>Tue, 05 Oct 2010 16:06:19 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[HPC-Cluster@RRZE]]></category>
		<category><![CDATA[LiMa]]></category>
		<category><![CDATA[TinyFat]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5514</guid>
		<description><![CDATA[<p>Unser neuer HPC-Cluster hat geschlossene Racks mit Kühleinheiten dazwischen (vgl. <a href="http://blogs.fau.de/zeiser/2010/09/21/rechnerzuwachs-und-generationswechsel-bei-den-hpc-clustern-am-rrze/">http://blogs.fau.de/zeiser/2010/09/21/rechnerzuwachs-und-generationswechsel-bei-den-hpc-clustern-am-rrze/</a>). Auf die Mail meines Kollegen mit der Bitte um DNS-Einträge für die <em>Kühlschränke</em> kam als Antwort <em>...wenn Ihr bei den Toastern angelangt seid, nehmt ihr aber IPv6</em>. Die Rechenknoten sind zwar so etwas ähnliches wie <em>Toster</em>, haben aber trotzdem IPv4-Adressen.</p><p><h3>Aktueller LiMa-Status:</h3></p><p></p><p>Zurück zu den ernsten Dingen: der mechanische Aufbau und die Verkabelung des Clusters sind praktisch abgeschlossen und am vergangenen Donnerstag (30.9.2010) wurden erstmals alle Knoten des neuen Clusters eingeschaltet. Es geht also voran ....</p><p><h3>Ein paar weitere Impressionen vom Aufbau:</h3></p><p></p><p></p><p></p><p></p><p></p><p></p><p></p><p></p><p></p><p></p><p></p><p></p>]]></description>
				<content:encoded><![CDATA[<p>Unser neuer HPC-Cluster hat geschlossene Racks mit Kühleinheiten dazwischen (vgl. <a href="http://blogs.fau.de/zeiser/2010/09/21/rechnerzuwachs-und-generationswechsel-bei-den-hpc-clustern-am-rrze/">http://blogs.fau.de/zeiser/2010/09/21/rechnerzuwachs-und-generationswechsel-bei-den-hpc-clustern-am-rrze/</a>). Auf die Mail meines Kollegen mit der Bitte um DNS-Einträge für die <em>Kühlschränke</em> kam als Antwort <em>&#8230;wenn Ihr bei den Toastern angelangt seid, nehmt ihr aber IPv6</em>. Die Rechenknoten sind zwar so etwas ähnliches wie <em>Toster</em>, haben aber trotzdem IPv4-Adressen.</p>
<h3>Aktueller LiMa-Status:</h3>
<p>Zurück zu den ernsten Dingen: der mechanische Aufbau und die Verkabelung des Clusters sind praktisch abgeschlossen und am vergangenen Donnerstag (30.9.2010) wurden erstmals alle Knoten des neuen Clusters eingeschaltet. Es geht also voran &#8230;.</p>
<h3>Ein paar weitere Impressionen vom Aufbau:</h3>
<p><div id="attachment_5516" class="wp-caption alignnone" style="width: 310px"><a href="http://blogs.fau.de/zeiser/files/2010/10/verlegung-rohre2.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/verlegung-rohre2-300x200.jpg" alt="" width="300" height="200" class="size-medium wp-image-5516" /></a><p class="wp-caption-text">Verlegung der Rohre im Doppelboden</p></div><br />
<div id="attachment_5518" class="wp-caption alignnone" style="width: 310px"><a href="http://blogs.fau.de/zeiser/files/2010/10/rohre-mit-isolierung-im-doppelboden.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/rohre-mit-isolierung-im-doppelboden-300x200.jpg" alt="" width="300" height="200" class="size-medium wp-image-5518" /></a><p class="wp-caption-text">Kaltwasserrohre mit Isolierung im Doppelboden. Man beachte das Verhältnis von Wasserrohrdurchmesser und Dicke der Stützen des Doppelbodens!</p></div></p>
<div style="clear:both"></div>
<p><div id="attachment_5522" class="wp-caption alignnone" style="width: 310px"><a href="http://blogs.fau.de/zeiser/files/2010/10/zwischenlagerung.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/zwischenlagerung-300x200.jpg" alt="" width="300" height="200" class="size-medium wp-image-5522" /></a><p class="wp-caption-text">Zwischenlagerung der Rechenknoten und Kabel</p></div><br />
<div id="attachment_5517" class="wp-caption alignnone" style="width: 210px"><a href="http://blogs.fau.de/zeiser/files/2010/10/doppelboden-mit-kabel2.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/doppelboden-mit-kabel2-200x300.jpg" alt="" width="200" height="300" class="size-medium wp-image-5517" /></a><p class="wp-caption-text">Rohre und Infinibandkabel im Doppelboden</p></div></p>
<div style="clear:both"></div>
<p><div id="attachment_5519" class="wp-caption alignnone" style="width: 210px"><a href="http://blogs.fau.de/zeiser/files/2010/10/ib-switch-vorderseite2.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/ib-switch-vorderseite2-200x300.jpg" alt="" width="200" height="300" class="size-medium wp-image-5519" /></a><p class="wp-caption-text">Vorderseite des 324-Port Infiniband-Switches mit 12x Kabeln zu weiteren Leave-Switches</p></div><br />
<div id="attachment_5520" class="wp-caption alignnone" style="width: 210px"><a href="http://blogs.fau.de/zeiser/files/2010/10/ib-switch-rueckseite.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/ib-switch-rueckseite-200x300.jpg" alt="" width="200" height="300" class="size-medium wp-image-5520" /></a><p class="wp-caption-text">Rückseite des Infiniband-Switches - gut 300 Kupferkabel und einige optische Infiniband-Kabel</p></div></p>
<div style="clear:both"></div>
<p><div id="attachment_5529" class="wp-caption alignnone" style="width: 210px"><a href="http://blogs.fau.de/zeiser/files/2010/10/lima-racks.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/lima-racks-200x300.jpg" alt="" width="200" height="300" class="size-medium wp-image-5529" /></a><p class="wp-caption-text">Auschnitt der LiMa-Rack-Reihe</p></div><br />
<div id="attachment_5521" class="wp-caption alignnone" style="width: 210px"><a href="http://blogs.fau.de/zeiser/files/2010/10/tinyfat-mit-kuehleinheit.jpg"><img src="http://blogs.fau.de/zeiser/files/2010/10/tinyfat-mit-kuehleinheit-200x300.jpg" alt="" width="200" height="300" class="size-medium wp-image-5521" /></a><p class="wp-caption-text">TinyFat-Rack und eine Kühleinheit</p></div></p>
<div style="clear:both"></div>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2010/10/05/lima-kuhlschranke-und-toster/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Recipe for building OpenFOAM-1.7.1 with Intel Compilers and Intel MPI</title>
		<link>http://blogs.fau.de/zeiser/2010/09/25/recipe-for-building-openfoam-1-7-1-with-intel-compilers-and-intel-mpi/</link>
		<comments>http://blogs.fau.de/zeiser/2010/09/25/recipe-for-building-openfoam-1-7-1-with-intel-compilers-and-intel-mpi/#comments</comments>
		<pubDate>Sat, 25 Sep 2010 16:17:59 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[OpenFOAM]]></category>
		<category><![CDATA[TinyBlue]]></category>
		<category><![CDATA[Woody-Cluster]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5499</guid>
		<description><![CDATA[<p>Compared with other software, installing OpenFOAM is (still) a nightmare. They use their very own build system, there are tons of environment variables to set, etc. But it seems that users in academia and industry accept OpenFOAM nevertheless. For release 1.7.1, I took the time to create a <a href="http://blogs.fau.de/zeiser/files/2010/09/of-1.7.1-rrze-install.sh.txt">little receipt</a> (in some parts very specifically tailored to RRZE's installation of software packages) to more or less automatically build OpenFOAM and some accompanying Third Party packages from scratch using the <em>Intel Compilers (icc/icpc)</em> and <em>Intel MPI</em> instead of Gcc and Open MPI (only Qt and Paraview are still built using gcc). The script is provided as-is without any guarantee that it works elsewhere and of course also without any support. The script assumes that the required source code packages have already been downloaded. Where necessary, the unpacked sources are patched and the compilation commands are executed. Finally, two new tar balls are created which contain the required "output" for a clean binary installation, i.e. intermediate output files (e.g. *.dep) are not included ...</p><p></p><p>Compilation takes ages, but that's not really surprising. Only extracting the tar balls with the sources amounts to 1.6 GB in almost 45k files/directories. After compilation (although neither Open MPI nor Gcc are built) the size is increased to 6.5 GB or 120k files. If all intermediate compilation files are removed, there are still about 1 GB or 30k files/directories remaining in my "clean installation" (with only the Qt/ParaView libraries/binaries in the ThirdParty tree).</p><p></p><p>RRZE users find OpenFOAM-1.7.1 as module on Woody and TinyBlue. The binaries used for Woody and TinyBlue are slightly different as both were natively compiled on SuSE SLES 10SP3 and Ubuntu 8.04, respectively. The main difference should only be in the Qt/Paraview part as SLES10 and Ubuntu 8.04 come with different Python versions. ParaView should also be compiled with MPI support.</p><p></p><p></p><p><strong>Note (2012-06-08):</strong> to be able to compile src/finiteVolume/fields/fvPatchFields/constraint/wedge/wedgeFvPatchScalarField.C with recent versions of the Intel compiler, one has to patch this file to avoid an <em>no instance of overloaded function "Foam:operator==" matches the argument list</em> error message; cf. <a href="http://www.cfd-online.com/Forums/openfoam-installation/101961-compiling-2-1-0-rhel6-2-icc.html">http://www.cfd-online.com/Forums/openfoam-installation/101961-compiling-2-1-0-rhel6-2-icc.html</a> and <a href="https://github.com/OpenFOAM/OpenFOAM-2.1.x/commit/8cf1d398d16551c4931d20d9fc3e42957d0f93ca">https://github.com/OpenFOAM/OpenFOAM-2.1.x/commit/8cf1d398d16551c4931d20d9fc3e42957d0f93ca</a>. These links are for OF-2.1.x but the fix works for OF-1.7.1 as well.</p>]]></description>
				<content:encoded><![CDATA[<p>Compared with other software, installing OpenFOAM is (still) a nightmare. They use their very own build system, there are tons of environment variables to set, etc. But it seems that users in academia and industry accept OpenFOAM nevertheless. For release 1.7.1, I took the time to create a <a href="http://blogs.fau.de/zeiser/files/2010/09/of-1.7.1-rrze-install.sh.txt">little receipt</a> (in some parts very specifically tailored to RRZE&#8217;s installation of software packages) to more or less automatically build OpenFOAM and some accompanying Third Party packages from scratch using the <em>Intel Compilers (icc/icpc)</em> and <em>Intel MPI</em> instead of Gcc and Open MPI (only Qt and Paraview are still built using gcc). The script is provided as-is without any guarantee that it works elsewhere and of course also without any support. The script assumes that the required source code packages have already been downloaded. Where necessary, the unpacked sources are patched and the compilation commands are executed. Finally, two new tar balls are created which contain the required &#8220;output&#8221; for a clean binary installation, i.e. intermediate output files (e.g. *.dep) are not included &#8230;</p>
<p>Compilation takes ages, but that&#8217;s not really surprising. Only extracting the tar balls with the sources amounts to 1.6 GB in almost 45k files/directories. After compilation (although neither Open MPI nor Gcc are built) the size is increased to 6.5 GB or 120k files. If all intermediate compilation files are removed, there are still about 1 GB or 30k files/directories remaining in my &#8220;clean installation&#8221; (with only the Qt/ParaView libraries/binaries in the ThirdParty tree).</p>
<p>RRZE users find OpenFOAM-1.7.1 as module on Woody and TinyBlue. The binaries used for Woody and TinyBlue are slightly different as both were natively compiled on SuSE SLES 10SP3 and Ubuntu 8.04, respectively. The main difference should only be in the Qt/Paraview part as SLES10 and Ubuntu 8.04 come with different Python versions. ParaView should also be compiled with MPI support.</p>
<p><strong>Note (2012-06-08):</strong> to be able to compile <code>src/finiteVolume/fields/fvPatchFields/constraint/wedge/wedgeFvPatchScalarField.C</code> with recent versions of the Intel compiler, one has to patch this file to avoid an <em>no instance of overloaded function &#8220;Foam:operator==&#8221; matches the argument list</em> error message; cf. <a href="http://www.cfd-online.com/Forums/openfoam-installation/101961-compiling-2-1-0-rhel6-2-icc.html">http://www.cfd-online.com/Forums/openfoam-installation/101961-compiling-2-1-0-rhel6-2-icc.html</a> and <a href="https://github.com/OpenFOAM/OpenFOAM-2.1.x/commit/8cf1d398d16551c4931d20d9fc3e42957d0f93ca">https://github.com/OpenFOAM/OpenFOAM-2.1.x/commit/8cf1d398d16551c4931d20d9fc3e42957d0f93ca</a>. These links are for OF-2.1.x but the fix works for OF-1.7.1 as well.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2010/09/25/recipe-for-building-openfoam-1-7-1-with-intel-compilers-and-intel-mpi/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Halbzeit beim Projekt SKALB</title>
		<link>http://blogs.fau.de/zeiser/2010/09/24/halbzeit-beim-projekt-skalb/</link>
		<comments>http://blogs.fau.de/zeiser/2010/09/24/halbzeit-beim-projekt-skalb/#comments</comments>
		<pubDate>Fri, 24 Sep 2010 12:38:46 +0000</pubDate>
		<dc:creator>Thomas Zeiser</dc:creator>
				<category><![CDATA[SKALB]]></category>

		<guid isPermaLink="false">http://blogs.fau.de/zeiser/?p=5493</guid>
		<description><![CDATA[<p>Beim BMBF-HPC-Verbundprojekt "SKALB" (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) ist inzwischen die Hälfte der Projektlaufzeit verstrichen. Wer sich über die bisherigen Projektergebnisse informieren will, findet auf der Projekt-Webseite <a href="http://www.skalb.de/">www.skalb.de</a> in der Rubrik <em>Ergebnisse &amp; Showcases</em> neben einer Auflistung von Vorträgen und Publikationen auch die Managementzusammenfassungen der Projektzwischenberichte für die ersten drei Halbjahre.</p>]]></description>
				<content:encoded><![CDATA[<p>Beim BMBF-HPC-Verbundprojekt &#8220;SKALB&#8221; (Lattice-Boltzmann-Methoden für skalierbare Multi-Physik-Anwendungen) ist inzwischen die Hälfte der Projektlaufzeit verstrichen. Wer sich über die bisherigen Projektergebnisse informieren will, findet auf der Projekt-Webseite <a href="http://www.skalb.de/">www.skalb.de</a> in der Rubrik <em>Ergebnisse &amp; Showcases</em> neben einer Auflistung von Vorträgen und Publikationen auch die Managementzusammenfassungen der Projektzwischenberichte für die ersten drei Halbjahre.</p>
]]></content:encoded>
			<wfw:commentRss>http://blogs.fau.de/zeiser/2010/09/24/halbzeit-beim-projekt-skalb/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
