Node-Level Performance Engineering
TimeMonday, 15 November 20218am - 5pm CST
DescriptionAs we move towards exascale, the gap between peak and application performance continues to open. Paradoxically, slow code tends to be highly scalable. Consequently, valuable resources are wasted, often on a massive scale. If the user values resource efficiency on any scale, optimal performance on the node level is paramount. We convey the architectural features of current processor chips, multiprocessor nodes, and accelerators, as far as they are relevant for the practitioner. Peculiarities like SIMD, cache topology, bandwidth bottlenecks, and ccNUMA characteristics are introduced, and the influence of system topology and affinity on the performance of parallel code is demonstrated. Performance engineering is introduced as a powerful tool that helps the user understand the bottlenecks at hand and to assess the impact of optimizations. A cornerstone of these concepts is the roofline model, which is described in detail with useful case studies and limits of its applicability.