Thanks to Sun Microsystems and the kind people from RWTH Aachen we had access to a pre-production UltraSPARC T2 (a.k.a. “Niagara 2″) system for some tests. We didn’t manage to get the whole RRZE benchmark suite running but could produce some pretty interesting low-level stuff. The peculiar way the N2 addresses its four built-in memory controllers leads to severe congestion if the mutual alignment of memory streams is unfortunate. This can be seen, e.g. with Lattice-Boltzmann codes and, of course, also with STREAM.
That said, if you know what you’re doing and somehow manage to get what we have learned into applications, the Niagara2 is a pretty interesting processor. In a single socket it has a nominal memory bandwidth of >60 GB/s (40 read, 20 write) of which about a third can be actually measured. At a peak performance of 11 GFlop/s (8 cores at 1.4 GHz), this makes for a pretty impressive machine balance which is far beyond any other cache-based CPU. And finally, the multi-threaded architecture is much less strange and hard to grasp than, e.g., the Cell design.
If you want to know more, here’s a presentation we prepared for the recent “SunDay” at RRZE. Please bear in mind that all of this is still preliminary data and will have to be confirmed on production hardware.
All about the N2′s microarchitecture can be found in this neat document: