Did the GPU Obfuscate the Load Imbalance in My MPI Simulation?
Parallel Programming Languages and Models
Resource Management and Scheduling
TimeSunday, 14 November 202112pm - 12:30pm CST
DescriptionThe current proliferation of GPU-based HPC systems necessitates a method for assessing the performance of simulations on heterogeneous machines. The addition of GPUs to a system adds multiple hierarchical levels of parallelism to the node architecture. In this paper, we demonstrate that the traditional load imbalance metric is insufficient for capturing the load imbalance on GPU-based machines, since it treats the GPU as a monolithic entity and ignores the internal parallelism. We propose a new hierarchical metric that improves the correlation of measured performance and application workload by up to 20.61%. Using our metric for determining application load instead of the traditional metric as the input for the load balancing algorithm reduces the residual load imbalance by up to 4x in our application.