SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Ribbon: Cost-Effective and QoS-Aware Deep Learning Model Inference Using a Diverse Pool of Cloud Computing Instances


Authors: Baolin Li, Rohan Roy, and Tirthak Patel (Northeastern University); Vijay Gadepally and Karen Gettings (Massachusetts Institute of Technology (MIT) Lincoln Laboratory); and Devesh Tiwari (Northeastern University)

Abstract: Deep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces Ribbon, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind Ribbon is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. Ribbon devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, Ribbon demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. Ribbon saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.




Back to Technical Papers Archive Listing