Whale: Efficient One-to-Many Data Partitioning in RDMA-Assisted Distributed Stream Processing Systems
Resource Management and Scheduling
TimeThursday, 18 November 20214:30pm - 5pm CST
DescriptionThe one-to-many data partitioning strategy in a distributed stream processing system (DSPS) plays an important role in various applications, where the upstream processing instance sends a tuple to a potentially large number of downstream processing instances. Therefore, a DSPS actually sends a same data item to a machine multiple times, raising significant unnecessary costs for serialization and communications, leading to performance bottleneck.
To address the problem, we design and implement Whale, an efficient RDMA-assisted distributed stream processing system. Whale proposes a novel RDMA-assisted stream multicast scheme with a self-adjusting non-blocking tree structure to alleviate the CPU workloads of an upstream instance during data partitioning. We re-design the DSPS communication mechanism by replacing the instance-oriented communication with a worker-oriented communication scheme, which saves significant costs for redundant serialization and communications. Experimental results show that Whale achieves 56.6x improvement of system throughput and 97% reduction of processing latency compared to existing designs.