MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

SC21 Proceedings

MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers

Authors: Kiran Ranganath (University of California, Riverside); Joshua D. Suetterlein and Joseph Manzano (Pacific Northwest National Laboratory (PNNL)); Shuaiwen Leon Song (University of Sydney); and Daniel Wong (University of California, Riverside)

Abstract: Multi-accelerator servers are increasingly being deployed in shared multi-tenant environments (such as in cloud data centers) in order to meet the demands of large-scale compute-intensive workloads. In addition, these accelerators are increasingly being interconnected in complex topologies and workloads are exhibiting a wider variety of inter-accelerator communication patterns. However, existing allocation policies are ill-suited for emerging use cases. Specifically, this work identifies that multi-accelerator workloads are commonly fragmented leading to reduced bandwidth and increased latency for inter-accelerator communication.

We propose Multi-Accelerator Pattern Allocation (MAPA), a graph pattern mining approach towards providing generalized allocation support for multi-accelerator workloads on multi-accelerator servers. We demonstrate that MAPA is able to improve the execution time of multi-accelerator workloads and is able to provide generalized benefits across various accelerator topologies. Finally, we demonstrate a speedup of 12.4% for the 75th percentile of jobs with the worst-case execution time reduced up to 35% against baseline policy.

Presentation: file

Back to Technical Papers Archive Listing