SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Near-Data FPGA-Accelerated Processing of Collective and Inference Operations in Disaggregated Memory Systems


Workshop:H2RC: Seventh International Workshop on Heterogeneous High-Performance Reconfigurable Computing

Authors: Carsten Heinz and Andreas Koch (Embedded Systems and Applications Group, TU Darmstadt)


Abstract: With growing data set sizes, many scientific and data center HPC workloads observe an increasing scaling imbalance, e.g., between compute and memory capacities. As a solution, disaggregated system architectures employ spatial distribution of the different resources. They aim for independent scaling of the different resource kinds (e.g., compute, non-volatile storage, memory), and use fast communication fabrics for their interconnection.

However, for some bulk operations, such as reductions and collections, it is still beneficial to perform them close to the memories, avoiding the need to move large volumes of data over the fabric.

This work realizes a disaggregated system capable of performing such near-data processing (NDP) operations by extending the distributed memory controllers with hardware-accelerated compute capabilities. The actual computations execute on FPGAs and can be abstractly described using C/C++ as compilable by high-level hardware synthesis (HLS) tools.

We have aimed for high usability of our technology also by HPC experts unfamiliar with hardware design. An automated toolflow encapsulates the creation and deployment of the actual accelerators in the disaggregated system. The NDP operations execute distributed across all memory nodes, and are easily accessed using a simple MPI-based programming interface that requires only minimal effort to use in existing applications.

Our solution is demonstrated using a prototype disaggregated system based on the low-latency EXTOLL fabric for communication. We evaluate both conventional reductions/collectives as well as complete machine-learning inference tasks.





Back to H2RC: Seventh International Workshop on Heterogeneous High-Performance Reconfigurable Computing Archive Listing



Back to Full Workshop Archive Listing