An Extended Roofline Performance Model with PCI-E and Network Ceilings
Parallel Programming Languages and Models
TimeMonday, 15 November 202111am - 11:30am CST
DescriptionIn this work, we evaluate the utility of adding two new diagonal ceilings to the roofline model related to PCI-E and effective network bandwidths to provide insights into how communication impacts the performance of large-scale parallel applications. The roofline performance analysis is based on two benchmark problems: scalar dense matrix addition and dense symmetric eigen-problem with complex matrix. The experiments were conducted on the NERSC Cori supercomputer at Lawrence Berkeley National Laboratory, on both the CPU-only and CPU+GPU compute nodes. The study reveals the value of incorporating these two new ceilings into the roofline model, in addition to the existing memory bandwidth and compute ceilings, in order to ease the identification of performance bottlenecks to better guide the performance optimization process, particularly in the limit of diminishing strong and weak scaling. We highlight the importance of comparing obtained application roofline points to customized ceilings for the communication and data-access patterns present. In this way, the effects of both throughput and latency can be captured in the model.