SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Evaluation of two topology-aware heuristics on level-3 BLAS library for multi-GPU platforms


Workshop:PAW-ATM 2021: The 4th Annual Parallel Applications Workshop, Alternatives To MPI+X

Authors: Thierry Gautier (LIP/INRIA/CNRS/UCL) and Joao Vicente Ferreira Lima (Universidade Federal de Santa Maria)


Abstract: Nowadays GPUs have dominated the market considering the computing/power metric and numerous research works have provided Basic Linear Algebra Subprograms implementations accelerated on GPUs. Several software libraries have been developed for exploiting performance of systems with accelerators, but the real performance may be far from the platform peak performance with multiple GPUs. This paper presents two runtime heuristics to gain in performance when task based programs are performed on heterogeneous architecture such as multi-GPU systems. The first is a topology-aware policy to takes into account the heterogeneity of the high speed links that interconnect GPUs. The second is an optimistic heuristic that favors communication between devices. These have been implemented in the XKBLAS library BLAS-3 library. We made experiments on a NVIDIA DGX-1 with up to 8 GPUs V100 on a set of Basic Linear Algebra Subroutines. Experimental results on kernels showed that XKBlas outperformed most implementations including the overhead of creation and scheduling of dynamic tasks.


Website:






Back to PAW-ATM 2021: The 4th Annual Parallel Applications Workshop, Alternatives To MPI+X Archive Listing



Back to Full Workshop Archive Listing