FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks

SC21 Proceedings

FreeLunch: Compression-based GPU Memory Management for Convolutional Neural Networks

Workshop:MCHPC’21: Workshop on Memory Centric High Performance Computing

Authors: Shaurya Patel, Tongping Liu, and Hui Guan (University of Massachusetts, Amherst)

Abstract: Recently, there is a trend to develop deeper and wider Convolutional Neural Networks (CNNs) to improve task accuracy. Due to this reason, the GPU memory quickly becomes the performance bottleneck since its capacity cannot keep up with the increase of the memory requirement of CNN models. Existing solutions exploit techniques such as swapping and recomputation to accommodate the shortage of memory. However, they suffer from performance degradations due to either the limited CPU-GPU bandwidth or the significant recomputation cost. This paper proposes a compression-based technique called FreeLunch that actively compresses the intermediate data to reduce the memory footprint of large CNN models. Based on our evaluation, FreeLunch has up to 35% less memory consumption and up to 70% better throughput than swapping and recomputation.

Back to MCHPC’21: Workshop on Memory Centric High Performance Computing Archive Listing

Back to Full Workshop Archive Listing