SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Whale: Efficient One-to-Many Data Partitioning in RDMA-Assisted Distributed Stream Processing Systems

Authors: Jei Tan, Hanhua Chen, Yonghui Wang, and Hai Jin (Huazhong University of Science and Technology)

Abstract: The one-to-many data partitioning strategy in a distributed stream processing system (DSPS) plays an important role in various applications, where the upstream processing instance sends a tuple to a potentially large number of downstream processing instances. Therefore, a DSPS actually sends a same data item to a machine multiple times, raising significant unnecessary costs for serialization and communications, leading to performance bottleneck.

To address the problem, we design and implement Whale, an efficient RDMA-assisted distributed stream processing system. Whale proposes a novel RDMA-assisted stream multicast scheme with a self-adjusting non-blocking tree structure to alleviate the CPU workloads of an upstream instance during data partitioning. We re-design the DSPS communication mechanism by replacing the instance-oriented communication with a worker-oriented communication scheme, which saves significant costs for redundant serialization and communications. Experimental results show that Whale achieves 56.6x improvement of system throughput and 97% reduction of processing latency compared to existing designs.

Presentation: file

Back to Technical Papers Archive Listing