Full-day tutorial at Supercomputing Conference 2022 (SC 22), November 13-18, 2022, Dallas, TX, USA:

Node-Level Performance Engineering

Slides for download: NLPE_SC22_full.pdf

Interesting links:

Authors/Presenters

Georg Hager¹, Thomas Gruber¹, and Gerhard Wellein²

¹ Erlangen National High Performance Computing Center (NHR@FAU)
² Department of Computer Science and Erlangen National High Performance Computing Center
Universität Erlangen-Nürnberg
Germany

{georg.hager,thomas.gruber,gerhard.wellein}@fau.de

Abstract

As we move towards exascale, the gap between peak and application performance is continuing to open. Paradoxically, bad node-level performance leads to highly scalable code, but at the price of increased overall time to solution. Consequently, valuable resources are wasted, often on a massive scale. If the user cares about time to solution on any scale, optimal performance on the node level is often the key factor. We convey the architectural features of current processor chips, multiprocessor nodes, and accelerators, as far as they are relevant for the practitioner. Peculiarities like SIMD vectorization, shared vs. separate caches, bandwidth bottlenecks, and ccNUMA characteristics are introduced, and the influence of system topology and affinity on the performance of typical parallel programming constructs is demonstrated. Performance engineering and performance patterns are suggested as powerful tools that help the user understand the bottlenecks at hand and to assess the impact of possible code optimizations. A cornerstone of these concepts is the roofline model, which is described in detail, including useful case studies, limits of its applicability, and possible refinements.

Georg Hager's Blog

Random thoughts on High Performance Computing

Content

SC22