BoF session at Supercomputing 2011, November 13-18, 2011, Seattle, WA:
1000 x 0 = 0. Single-node optimisation does matter.
Bettina Krammer (UVSQ/ECR), Georg Hager (RRZE), Jan Treibig (RRZE)
With current systems approaching 10 petaflops and future systems potentially reaching exaflop performance, much attention has been drawn to scalability of applications, runtime environments and tools, while an important aspect often gets neglected: single-node optimisation is still necessary, contributing significantly to overall performance, be it w.r.t. execution time, resource contention or power consumption. This is evident even from the most basic performance and scalability models. However, contrary to popular views the problem of getting good single-node performance is far from being solved. We show some experiences with real-world applications using our approach of combining various performance tools, from high-level analysis (e.g. MPI communication behaviour) down to low-level analysis (e.g. memory behaviour, microbenchmarking). We address both multi-node and single-node issues, but put emphasis on single-node optimisation, since application developers often miss out on tuning opportunities there. We invite application, middleware and tools developers to discuss their experiences and requirements with us.
- Introduction (Krammer/Hager/Treibig) – 1-BoF-single-node-SC11
- Case study: Implementing the Lattice-Boltzmann Method on modern multicore systems (Gerhard Wellein) – 2-BoF-LBM-SC11
- Case study: Hybrid-parallel sparse matrix-vector multiplication (Hager) – 3-spMVM-SC11
- Case study: Quantum chemistry towards exascale with QMC=Chem (Anthony Scemama) – 4-scemama-SC11
- Conclusions (Krammer) – 5-BoF-single-node-SC11