No Travel? No Problem.

Remote Participation
Near-Data FPGA-Accelerated Processing of Collective and Inference Operations in Disaggregated Memory Systems
Event Type
Accelerator-based Architectures
Emerging Technologies
Heterogeneous Systems
Memory Systems
Registration Categories
TimeMonday, 15 November 202111am - 11:30am CST
DescriptionWith growing data set sizes, many scientific and data center HPC workloads observe an increasing scaling imbalance, e.g., between compute and memory capacities. As a solution, disaggregated system architectures employ spatial distribution of the different resources. They aim for independent scaling of the different resource kinds (e.g., compute, non-volatile storage, memory), and use fast communication fabrics for their interconnection.

However, for some bulk operations, such as reductions and collections, it is still beneficial to perform them close to the memories, avoiding the need to move large volumes of data over the fabric.

This work realizes a disaggregated system capable of performing such near-data processing (NDP) operations by extending the distributed memory controllers with hardware-accelerated compute capabilities. The actual computations execute on FPGAs and can be abstractly described using C/C++ as compilable by high-level hardware synthesis (HLS) tools.

We have aimed for high usability of our technology also by HPC experts unfamiliar with hardware design. An automated toolflow encapsulates the creation and deployment of the actual accelerators in the disaggregated system. The NDP operations execute distributed across all memory nodes, and are easily accessed using a simple MPI-based programming interface that requires only minimal effort to use in existing applications.

Our solution is demonstrated using a prototype disaggregated system based on the low-latency EXTOLL fabric for communication. We evaluate both conventional reductions/collectives as well as complete machine-learning inference tasks.
Back To Top Button