Unleashing the Performance of bmSparse for the Sparse Matrix Multiplication in GPUs
TimeFriday, 19 November 202110:30am - 10:50am CST
DescriptionThe evolution of data science and machine learning has increased the applicability of the sparse matrix multiplication (SPGEMM) kernel. Unlike more well-known operations such as the SPMV, in the SPGEMM the nonzero pattern of the result is determined by the interaction between the nonzero patterns of the inputs, which imposes serious challenges to the development of high-performance implementations for accelerators. Recent efforts in this subject aim to mitigate this irregularity through the use of block-based sparse storage formats, obtaining promising results on accelerators such as GPUs. In this work, we study the format bmSparse  and propose optimizations to attack the principal bottlenecks of the original SPGEMM implementation for Nvidia GPUs. We evaluate the proposal using nine sparse matrices of different sizes, showing remarkable speedups with respect to CUSPARSE’s CSR variant.