Presentation

· Contributors · Organizations · Search Program

Memory Optimizations for Sparse Linear Algebra on GPU Hardware

SessionMCHPC’21: Workshop on Memory Centric High Performance Computing

Author/Presenters

Event Type

Workshop

Tags

Registration Categories

TimeSunday, 14 November 20214pm - 4:30pm CST

LocationOnline

DescriptionAn effort to maximize memory bandwidth utilization
for a sparse linear algebra kernel executing on NVIDIA
Tesla V100 and A100 Graphics Processing Units (GPUs) is
described. The kernel consists of a block-sparse matrix-vector
product and a series of forward/backward triangular solves. The
computation is memory-bound and exhibits low arithmetic intensity.
An earlier implementation yield
good memory performance on the V100 architecture. However, a
new approach, which assigns a warp to six rows of the matrix, is
proposed for the A100. In addition, two new features offered by
the A100 architecture are explored. L2 residency control enables
a portion of the L2 cache to be used for persistent data access,
and the asynchronous copy instruction allows data to be loaded
directly from the main memory into shared memory. The new implementation improves memory bandwidth utilization from 71.5% to 81.2% of the peak available on the A100 architecture.

Author/Presenters

Aaron Walden

NASA Langley Research Center

Mohammad Zubair

Old Dominion University

Christopher Stone

National Institute of Aerospace

Eric Nielsen

NASA Langley Research Center

No Travel? No Problem.