No Travel? No Problem.

Remote Participation
Padding to Extend the Bruck Algorithm for Non-Uniform All-to-All Communication
Event Type
Posters
Research Posters
Registration Categories
TP
XO / EX
TimeTuesday, 16 November 20218:30am - 5pm CST
LocationSecond Floor Atrium
DescriptionThe latency of the standard MPI_Alltoallv implementations is linear in the number of processes. Such linear complexity performs poorly when applications are deployed on millions of cores for short messages, which is dominated by latency. Bruck's algorithm is a classic logarithm algorithm for uniform all-to-all communication. It fails, however, to support messages of varying sizes. In this paper, we present the Padded Bruck algorithm, a natural extension strategy for applying Bruck’s algorithm to non-uniform all-to-all communication by transforming it into uniform all-to-all communication. We also analyze several variants of Bruck’s algorithm and investigate the underlying causes of their behavior, with the ultimate goal of gaining insights that can be applied to our Padded Bruck algorithm. When compared to Cray’s MPI_Alltoallv, our evaluation shows that our algorithm outperforms in most cases if the message size is less than 1024 bytes and the process count is smaller than 8192.
Back To Top Button