ndzip-gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs
Event Type
Paper

Algorithms
Cloud and Distributed Computing
Data Management
Parallel Programming Languages and Models
TP
TimeThursday, 18 November 20213:30pm - 4pm CST
Location220-221
DescriptionLossless data compression is a promising software approach for reducing the bandwidth requirements of scientific applications on accelerator clusters without introducing approximation errors. Suitable compressors must be able to effectively compact floating-point data while saturating the system interconnect to avoid introducing unnecessary latencies.
We present ndzip-gpu, a novel, highly-efficient GPU parallelization scheme for the block compressor ndzip, which has recently set a new milestone in CPU floating-point compression speeds.
Through the combination of intra-block parallelism and efficient memory access patterns, ndzip-gpu achieves high resource utilization in multi-dimensional data decorrelation. We further introduce an efficient warp-cooperative primitive for vertical bit packing, providing a high-throughput data reduction step.
Using a representative set of scientific data, we demonstrate that ndzip-gpu consistently outperforms all other lossless floating-point compressors accessible to us on NVIDIA Volta and Ampere hardware in both throughput and compression ratio achieved.
We present ndzip-gpu, a novel, highly-efficient GPU parallelization scheme for the block compressor ndzip, which has recently set a new milestone in CPU floating-point compression speeds.
Through the combination of intra-block parallelism and efficient memory access patterns, ndzip-gpu achieves high resource utilization in multi-dimensional data decorrelation. We further introduce an efficient warp-cooperative primitive for vertical bit packing, providing a high-throughput data reduction step.
Using a representative set of scientific data, we demonstrate that ndzip-gpu consistently outperforms all other lossless floating-point compressors accessible to us on NVIDIA Volta and Ampere hardware in both throughput and compression ratio achieved.
Download PDF
Archive view