Workshop:7th Workshop on Machine Learning in High Performance Environment
Authors: Sarunya Pumma and Abhinav Vishnu (Advanced Micro Devices (AMD) Inc)
Abstract: Deep Learning Recommendation Model (DLRM), a new neural network for
recommendation systems, introduces challenging requirements for deep
neural network training and inference. The size of the DLRM model
is typically large and not able to fit on a single GPU memory.
DLRM requires both model-parallel
and data-parallel for the bottom part and top part of the model when
running on multiple GPUs. Due to the hybrid-parallel model, the
all-to-all communication is used for welding the top and bottom
parts together. We have observed that the all-to-all communication
is costly and is a bottleneck in the DLRM training/inference.
In this paper, we reduce the communication volume by using
DLRM's properties to compress the
transferred data without information loss. We demonstrate benefits
of our method by training DLRM TeraByte on AMD
Instinct MI100 accelerators. The experimental results show 38%-59%
improvement in the time-to-solution of the DLRM TeraByte
training for FP32 and mixed-precision.
Back to 7th Workshop on Machine Learning in High Performance Environment Archive Listing