SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Unbalanced Parallel I/O: An Often-Neglected Side Effect of Lossy Scientific Data Compression


Workshop:DRBSD-7: The 7th International Workshop on Data Analysis and Reduction for Big Scientific Data

Authors: Xinying Wang (University of Nevada, Reno); Lipeng Wan, Jieyang Chen, Qian Gong, and Ben Whitney (Oak Ridge National Laboratory (ORNL)); Jinzhen Wang (New Jersey Institute of Technology); Ana Gainaru (Oak Ridge National Laboratory (ORNL)); Qing Liu (New Jersey Institute of Technology); Norbert Podhorszki (Oak Ridge National Laboratory (ORNL)); Dongfang Zhao and Feng Yan (University of Nevada, Reno); and Scott Klasky (Oak Ridge National Laboratory (ORNL))


Abstract: Lossy compression techniques have demonstrated promising results in significantly reducing the scientific data size while guaranteeing the compression error bounds. However, one important yet often neglected side effect of lossy scientific data compression is its impact on the performance of parallel I/O. Our key observation is that the compressed data size is often highly skewed across processes in lossy scientific compression. To understand this behavior, we apply three lossy compressors, which are specifically designed and optimized for scientific data, to three real-world scientific applications. Our analysis demonstrates that the sizes of compressed data are always skewed even if the original data is evenly decomposed among processes. We then systematically study how this side effect of lossy scientific data compression and observe that the skewness in the sizes of the compressed data often leads to I/O imbalance, which can significantly reduce the efficiency of I/O bandwidth utilization if not properly handled.





Back to DRBSD-7: The 7th International Workshop on Data Analysis and Reduction for Big Scientific Data Archive Listing



Back to Full Workshop Archive Listing