ELIXR: Eliminating Computation Redundancy in CNN-Based Video Processing
Event Type
Workshop
Architectures
Extreme Scale Comptuing
Heterogeneous Systems
W
TimeFriday, 19 November 202111:42am - 11:55am CST
Location229
DescriptionVideo processing frequently relies on applying convolutional neural networks (CNNs) for various tasks, including object tracking, real-time action classification, and image recognition. Due to complicated network design, processing even a single frame requires many operations, leading to low throughput and high latency. This process can be parallelized, but since consecutive images have similar content, most of these operations produce identical results, leading to inefficient usage of parallel hardware accelerators. In this paper, we present ELIXR, a software system that systematically addresses this computation redundancy problem in an architecture-independent way, using two key techniques. First, ELIXR implements a lightweight change propagation algorithm to automatically determine which data to recompute for each new frame, based on changes in the input. Second, ELIXR implements a dynamic check to further reduce needed computations, by leveraging special operators in the model (e.g., ReLU), and trading off accuracy for performance. We evaluate ELIXR on two real-world models, Inception V3 and Resnet-50, and two video streams. We show that ELIXR running on the CPU produces up to 3.49X speedup (1.76X on average) compared with frame sampling, given the same accuracy and real-time processing requirements, and we describe how our approach can be applied in an architecture-independent way to improve CNN performance in heterogeneous systems.