Presentation

· Contributors · Organizations · Search Program

Ribbon: Cost-Effective and QoS-Aware Deep Learning Model Inference Using a Diverse Pool of Cloud Computing Instances

SessionCloud and Edge Computing

Authors

Event Type

Paper

Tags

Reproducibility Badges

Registration Categories

TimeTuesday, 16 November 20212pm - 2:30pm CST

Location227-228

DescriptionDeep learning model inference is a key service in many businesses and scientific discovery processes. This paper introduces Ribbon, a novel deep learning inference serving system that meets two competing objectives: quality-of-service (QoS) target and cost-effectiveness. The key idea behind Ribbon is to intelligently employ a diverse set of cloud computing instances (heterogeneous instances) to meet the QoS target and maximize cost savings. Ribbon devises a Bayesian Optimization-driven strategy that helps users build the optimal set of heterogeneous instances for their model inference service needs on cloud computing platforms -- and, Ribbon demonstrates its superiority over existing approaches of inference serving systems using homogeneous instance pools. Ribbon saves up to 16% of the inference service cost for different learning models including emerging deep learning recommender system models and drug-discovery enabling models.

Download PDF

Paper available from the ACM OpenTOC

Archive view

Authors