SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations

Authors: Kun Li (Institute of Computing Technology, Chinese Academy of Sciences; Chinese Academy of Sciences); Liang Yuan and Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); and Yue Yue (Institute of Computing Technology, Chinese Academy of Sciences; Chinese Academy of Sciences)

Abstract: Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization techniques, aiming at exploiting the in-core data parallelism. They do, however, either incur spatial data conflicts or hurt the data locality when integrated with tiling. In this paper, a novel spatial computation folding is devised to reduce the data reorganization overhead for vectorization and preserve the data locality for tiling in the data space simultaneously. We then propose an approach of temporal computation folding enhanced with shifts reusing, tessellate tiling and semi-automatic code generation. This aims to further reduce the redundancy of arithmetic calculations and exploit the register reuse along the time dimension. Experimental results on the AVX2 and AVX-512 CPUs show that our approach obtains significant performance improvements compared with state-of-the-art techniques.

Presentation: file

Back to Technical Papers Archive Listing