Authors: Sihuan Li (University of California, Riverside); Sheng Di (Argonne National Laboratory (ANL)); Kai Zhao (University of California, Riverside); Xin Liang (Missouri University of Science and Technology); Zizhong Chen (University of California, Riverside); and Franck Cappello (Argonne National Laboratory (ANL))
Abstract: Today's exascale scientific applications or advanced instruments are producing vast volumes of data, which need to be shared/transferred through the network/devices with relatively low bandwidth (e.g., WAN). Lossy compression is an important strategy to resolve the big data issue, however, little work was done to make it resilient against silent errors, which may happen during the compression or data transferring. In this paper, we propose a resilient error-bounded lossy compressor. We design a new independent-block-wise model that decomposes each dataset into many independent sub-blocks to compress. Then, we design and implement a series of error detection/correction strategies for each stage of SZ. Our solution is the first algorithm-based fault tolerance (ABFT) algorithm for lossy compression. Our solution incurs negligible execution overhead in the fault-free situation. Should soft errors occur, it ensures decompressed data is strictly bounded within user's requirement, with a very limited degradation of compression ratio and low overhead.
Back to Technical Papers Archive Listing