SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Did the GPU Obfuscate the Load Imbalance in My MPI Simulation?


Workshop:HiPar21: 2nd Workshop on Hierarchical Parallelism for Exascale Computing

Authors: David Eberius (Oak Ridge National Laboratory (ORNL)) and David Boehme and Olga Pearce (Lawrence Livermore National Laboratory)


Abstract: The current proliferation of GPU-based HPC systems necessitates a method for assessing the performance of simulations on heterogeneous machines. The addition of GPUs to a system adds multiple hierarchical levels of parallelism to the node architecture. In this paper, we demonstrate that the traditional load imbalance metric is insufficient for capturing the load imbalance on GPU-based machines, since it treats the GPU as a monolithic entity and ignores the internal parallelism. We propose a new hierarchical metric that improves the correlation of measured performance and application workload by up to 20.61%. Using our metric for determining application load instead of the traditional metric as the input for the load balancing algorithm reduces the residual load imbalance by up to 4x in our application.





Back to HiPar21: 2nd Workshop on Hierarchical Parallelism for Exascale Computing Archive Listing



Back to Full Workshop Archive Listing