Ribbon: Cost-Effective and QoS-Aware Deep Learning Model Inference Using a Diverse Pool of Cloud Computing Instances
SessionCloud and Edge Computing
Event Type
Paper

Resource Management and Scheduling
TP
TimeTuesday, 16 November 20212pm - 2:30pm CST
Location227-228
DescriptionDeep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces Ribbon, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind Ribbon is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. Ribbon devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, Ribbon demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. Ribbon saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.
Download PDF
Archive view




