Semantic-Aware Lossless Data Compression for Deep Learning Recommendation Model (DLRM)

SC21 Proceedings

Semantic-Aware Lossless Data Compression for Deep Learning Recommendation Model (DLRM)

Workshop:7th Workshop on Machine Learning in High Performance Environment

Authors: Sarunya Pumma and Abhinav Vishnu (Advanced Micro Devices (AMD) Inc)

Abstract: Deep Learning Recommendation Model (DLRM), a new neural network for recommendation systems, introduces challenging requirements for deep neural network training and inference. The size of the DLRM model is typically large and not able to fit on a single GPU memory. DLRM requires both model-parallel and data-parallel for the bottom part and top part of the model when running on multiple GPUs. Due to the hybrid-parallel model, the all-to-all communication is used for welding the top and bottom parts together. We have observed that the all-to-all communication is costly and is a bottleneck in the DLRM training/inference.

In this paper, we reduce the communication volume by using DLRM's properties to compress the transferred data without information loss. We demonstrate benefits of our method by training DLRM TeraByte on AMD Instinct MI100 accelerators. The experimental results show 38%-59% improvement in the time-to-solution of the DLRM TeraByte training for FP32 and mixed-precision.

Website:

Back to 7th Workshop on Machine Learning in High Performance Environment Archive Listing

Back to Full Workshop Archive Listing