As it happens our paper on “Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers” has been accepted for the Workshop on Large-Scale Parallel Processing 2008 in Miami, Florida. It shows how to circumvent the bottlenecks in memory access that appear on the Niagara2 CPU when you don’t pay sufficient attention to alignment and aliasing problems.
You can take a look at the preprint: arXiv:0712.2302.