Authors: Fabian Knorr, Peter Thoman, and Thomas Fahringer (University of Innsbruck)
Abstract: Lossless data compression is a promising software approach for reducing the bandwidth requirements of scientific applications on accelerator clusters without introducing approximation errors. Suitable compressors must be able to effectively compact floating-point data while saturating the system interconnect to avoid introducing unnecessary latencies.
We present ndzip-gpu, a novel, highly-efficient GPU parallelization scheme for the block compressor ndzip, which has recently set a new milestone in CPU floating-point compression speeds.
Through the combination of intra-block parallelism and efficient memory access patterns, ndzip-gpu achieves high resource utilization in multi-dimensional data decorrelation. We further introduce an efficient warp-cooperative primitive for vertical bit packing, providing a high-throughput data reduction step.
Using a representative set of scientific data, we demonstrate that ndzip-gpu consistently outperforms all other lossless floating-point compressors accessible to us on NVIDIA Volta and Ampere hardware in both throughput and compression ratio achieved.
Back to Technical Papers Archive Listing