SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Enabling and Scaling the HPCG Benchmark on the Newest Generation Sunway Supercomputer with 42 Million Heterogeneous Cores


Authors: Qianchao Zhu (Center for Data Science, Peking University); Hao Luo (School of Mathematical Sciences, Peking University); Chao Yang (School of Mathematical Sciences, Peking University; National Engineering Laboratory for Big Data Analysis and Applications, Peking University); Mingshuo Ding (School of Electronics Engineering and Computer Science, Peking University); and Wanwang Yin and Xinhui Yuan (National Research Center of Parallel Computer Engineering and Technology, China)

Abstract: We study and evaluate performance optimization techniques for the HPCG benchmark on the newest generation Sunway supercomputer. Specifically, a two-level blocking scheme is proposed to expose adequate parallelism in the symmetric Gauss-Seidel kernel while keeping a fast convergence rate; a fine-grained kernel fusion technique is developed to alleviate the bandwidth load on local storage with small capacity; and a low overhead thread collaboration method is presented to efficiently move data between threads and hide its cost with data transfer operations. Test results show that the optimized HPCG code is able to exploit 73.0% of the theoretical memory bandwidth, and scale to over 42 million heterogeneous cores with 95.5% weak-scaling efficiency and 5.91 PFLOPS performance. We also study how the performance can be improved if the specific rules of HPCG are not fully obeyed, and design dependency-preserving parallelization and vectorization methods, further boosting performance to 27.6 PFLOPS.


Presentation: file


Back to Technical Papers Archive Listing