A Benchmark to Understand Communication Performance in Hybrid MPI and GPU Applications
Parallel Programming Languages and Models
TimeSunday, 14 November 20214pm - 4:30pm CST
DescriptionAnalyzing MPI communication costs on extreme-scale HPC systems is critical to ensuring optimal performance. Factors including scalability and widespread use of GPUs complicate this analysis. To address this challenge, we need benchmarks and tools that use GPU and host memory in a manner similar to production applications. These tools need to provide sufficient options to, for example, fine-tune and analyze the costs of low-level data exchange between host and device memory. In this extended abstract, we describe: the prototype implementation of a GPU-focused communication benchmark written in Kokkos/C++ based on FIESTA; and an MPI profiling library integrated with NVIDIA’s nvprof profiling tool. Using this combination, we demonstrate how GPU data movement overheads can vary and examine how these tools help us understand performance behavior. These tools are important contributions that we believe will guide data movement decisions on extreme-scale HPC systems.