SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Padding to Extend the Bruck Algorithm for Non-Uniform All-to-All Communication

Authors: Ke Fan, Thomas Gilray, and Sidharth Kumar (University of Alabama, Birmingham)

Abstract: The latency of the standard MPI_Alltoallv implementations is linear in the number of processes. Such linear complexity performs poorly when applications are deployed on millions of cores for short messages, which is dominated by latency. Bruck's algorithm is a classic logarithm algorithm for uniform all-to-all communication. It fails, however, to support messages of varying sizes. In this paper, we present the Padded Bruck algorithm, a natural extension strategy for applying Bruck’s algorithm to non-uniform all-to-all communication by transforming it into uniform all-to-all communication. We also analyze several variants of Bruck’s algorithm and investigate the underlying causes of their behavior, with the ultimate goal of gaining insights that can be applied to our Padded Bruck algorithm. When compared to Cray’s MPI_Alltoallv, our evaluation shows that our algorithm outperforms in most cases if the message size is less than 1024 bytes and the process count is smaller than 8192.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing