No Travel? No Problem.

Remote Participation
Memory Optimizations for Sparse Linear Algebra on GPU Hardware
Event Type
Workshop
Tags
Online Only
Architectures
Memory Systems
Parallel Programming Languages and Models
System Software and Runtime Systems
Registration Categories
W
TimeSunday, 14 November 20214pm - 4:30pm CST
LocationOnline
DescriptionAn effort to maximize memory bandwidth utilization
for a sparse linear algebra kernel executing on NVIDIA
Tesla V100 and A100 Graphics Processing Units (GPUs) is
described. The kernel consists of a block-sparse matrix-vector
product and a series of forward/backward triangular solves. The
computation is memory-bound and exhibits low arithmetic intensity.
An earlier implementation yield
good memory performance on the V100 architecture. However, a
new approach, which assigns a warp to six rows of the matrix, is
proposed for the A100. In addition, two new features offered by
the A100 architecture are explored. L2 residency control enables
a portion of the L2 cache to be used for persistent data access,
and the asynchronous copy instruction allows data to be loaded
directly from the main memory into shared memory. The new implementation improves memory bandwidth utilization from 71.5% to 81.2% of the peak available on the A100 architecture.
Back To Top Button