No Travel? No Problem.

Remote Participation
Data-Aware Storage Tiering for Deep Learning
Event Type
Data Analytics
Data Management
File Systems and I/O
Registration Categories
TimeMonday, 15 November 202111:45am - 12:10pm CST
DescriptionDNN models trained with large datasets can perform rich deep learning tasks with high accuracy. However, feeding huge volumes of training data exerts significant pressure on IO subsystems as the entire data is re-loaded in random order on every iteration to enable convergence, with very little scope for reuse. To address this challenge, we co-optimize data tiering and iteration in DNN training for any given dataset and model with bandwidth and convergence conscious mini-epoch training (MET). This approach can substantially reduce the IO bandwidth required to provide sustained read throughput. Further, we introduce two different feedback mechanisms to adjust the repeating factor over each mini-epoch during the training. We have evaluated three different applications with MET. Most of them work out-of-box with modest MET parameters. The adaptive repeating factor design was able to gain back most of the accuracy drop due lo large MET parameters.
Back To Top Button