HDF5 VOL Connector to Apache Arrow

SC21 Proceedings

HDF5 VOL Connector to Apache Arrow

Authors: Jie Ye, Anthony Kougkas, and Xian-He Sun (Illinois Institute of Technology)

Abstract: Apache Arrow is widely used in Big Data Analysis and Cloud Computing Area because of its standardized in-memory column format. It is a columnar, in-memory data representation that enables analytical systems and data sources to exchange and process data in real-time. It could create an in-memory column store that can be used to manage streamed data. Most science applications store and access their data through HDF5. However, HDF5 is inefficient in accessing column-oriented data streams. Accessing Apache Arrow data through HDF5 calls would allow applications to take advantage of these transient, column-oriented data streams, such as real-time data from high-speed scientific instruments and cameras. Moreover, bridging the gap between science applications and analytic tools that use HDF5 and Apache Arrow data could bring new kinds of data together. Therefore, this work introduces an HDF5 VOL connector that allows applications to access Apache Arrow data through native HDF5 calls.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing