Half-day tutorial at the third Saudi Arabian High Performance Computing (SAHPC) conference, December 1-3, 2012, at King Abdullah University of Science and Technology (KAUST) in Thuwal, Saudi Arabia:
Performance Engineering on Multi- and Manycores
As shown in the tutorial: SAHPC-Tutorial-2012-small.pdf
Including skipped slides: SAHPC-Tutorial-2012-full.pdf
Since the blog system does not allow uploading of Excel files, this is a link to my Dropbox: Excel sheet for the power model
Erlangen Regional Computing Center
University of Erlangen-Nuremberg
The advent of multi- and manycore chips has led to a further opening of the gap between peak and application performance for many scientific codes. Paradoxically, bad node-level performance helps to “efficiently” scale to massive parallelism, but at the price of increased overall time to solution. We convey the architectural features of current processor chips, multiprocessor nodes, and accelerators, as far as they are relevant for high-performance simulation. Typical bottlenecks are identified and the features and problems of the dominating programming models, MPI and OpenMP, are pointed out. Simple performance models on the chip and node level are introduced as powerful tools to get a grasp on what is “optimal performance”, what optimizations could be done to improve it, and what the expected benefit is.
We also comment on typical performance and scalability patterns and how they can be used to improve the energy efficiency of simulations. Finally, all these strategies are embedded into a structured “performance engineering” process, which we propose as a guiding principle in all HPC-related efforts.