SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Distributed Training for High Resolution Images: A Domain and Spatial Decomposition Approach


Workshop:RSDHA: Redefining Scalability for Diversely Heterogeneous Architectures

Authors: Aristeidis Tsaris, Jacob Hinkle, Dalton Lunga, and Philipe Dias (Oak Ridge National Laboratory (ORNL))


Abstract: In this work we developed two Pytorch libraries using the PyTorch RPC interface for distributed deep learning approaches on high resolution images. The spatial decomposition library allows for distributed training on very large images, which otherwise won’t be possible on a single GPU. The domain parallelism library allows for distributed training across multiple domain unlabeled data, by leveraging the domain separation architecture. Both of those libraries where tested on the Summit supercomputer at Oak Ridge National Laboratory at a moderate scale.





Back to RSDHA: Redefining Scalability for Diversely Heterogeneous Architectures Archive Listing



Back to Full Workshop Archive Listing