Authors: Manuel de Castro, Inmaculada Santamaria-Valenzuela, Sergio Miguel-LoĢpez, Yuri Torres, and Arturo Gonzalez-Escribano (University of Valladolid, Spain)
Abstract: Iterative stencil applications present a high degree of parallelism. The approach is appropriate for modern many-core systems, like GPUs. The huge arrays needed in many real problems, however, require multiple-GPUs memory. Manually programming multi-GPU solutions is complex, involving synchronization, communication and optimization problems. Previous specific frameworks for multi-GPU stencils present limitations on stencil types and target devices. Another approach is to build efficient solutions using modern portable heterogeneous programming models. We present a high-productivity parallel programming skeleton implemented as a C library function. It can derive the communication structure and kernel details from an abstract specification of the stencil pattern, splitting the computation to better overlap it with communications. Preliminary experimental results show a good scalability in a cluster with up to 16 GPUs. It also outperforms a state-of-the-art solution based on Celerity/SYCL. The presentation will include a short demo video using the artifact, and both on-site and on-line discussion.
Best Poster Finalist (BP): no
Poster: PDF
Poster summary: PDF
Back to Poster Archive Listing