Towards an Efficient Parallel Skeleton for Generic Iterative Stencil Computations in Distributed GPUs
TimeThursday, 18 November 20218:30am - 5pm CST
LocationSecond Floor Atrium
DescriptionIterative stencil applications present a high degree of parallelism. The approach is appropriate for modern many-core systems, like GPUs. The huge arrays needed in many real problems, however, require multiple-GPUs memory. Manually programming multi-GPU solutions is complex, involving synchronization, communication and optimization problems. Previous specific frameworks for multi-GPU stencils present limitations on stencil types and target devices. Another approach is to build efficient solutions using modern portable heterogeneous programming models. We present a high-productivity parallel programming skeleton implemented as a C library function. It can derive the communication structure and kernel details from an abstract specification of the stencil pattern, splitting the computation to better overlap it with communications. Preliminary experimental results show a good scalability in a cluster with up to 16 GPUs. It also outperforms a state-of-the-art solution based on Celerity/SYCL. The presentation will include a short demo video using the artifact, and both on-site and on-line discussion.