Resilient Error-Bounded Lossy Compressor for Data Transfer
Cloud and Distributed Computing
Parallel Programming Languages and Models
TimeThursday, 18 November 20214pm - 4:30pm CST
DescriptionToday's exascale scientific applications or advanced instruments are producing vast volumes of data, which need to be shared/transferred through the network/devices with relatively low bandwidth (e.g., WAN). Lossy compression is an important strategy to resolve the big data issue, however, little work was done to make it resilient against silent errors, which may happen during the compression or data transferring. In this paper, we propose a resilient error-bounded lossy compressor. We design a new independent-block-wise model that decomposes each dataset into many independent sub-blocks to compress. Then, we design and implement a series of error detection/correction strategies for each stage of SZ. Our solution is the first algorithm-based fault tolerance (ABFT) algorithm for lossy compression. Our solution incurs negligible execution overhead in the fault-free situation. Should soft errors occur, it ensures decompressed data is strictly bounded within user's requirement, with a very limited degradation of compression ratio and low overhead.