Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs

SC21 Proceedings

Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs

Workshop:12th Workshop on Latest Advances in Scalable Algorithms for Large Scale Systems

Authors: Gonzalo Berger, Manuel Freire, Renzo Marini, Ernesto Dufrechou, and Pablo Ezzatti (University of the Republic, Uruguay)

Abstract: The evolution of data science and machine learning has increased the applicability of the sparse matrix multiplication (SPGEMM) kernel. Unlike more well-known operations such as the SPMV, in the SPGEMM the nonzero pattern of the result is determined by the interaction between the nonzero patterns of the inputs, which imposes serious challenges to the development of high-performance implementations for accelerators. Recent efforts in this subject aim to mitigate this irregularity through the use of block-based sparse storage formats, obtaining promising results on accelerators such as GPUs. In this work, we study the format bmSparse [1] and propose optimizations to attack the principal bottlenecks of the original SPGEMM implementation for Nvidia GPUs. We evaluate the proposal using nine sparse matrices of different sizes, showing remarkable speedups with respect to CUSPARSE’s CSR variant.

Back to 12th Workshop on Latest Advances in Scalable Algorithms for Large Scale Systems Archive Listing

Back to Full Workshop Archive Listing