Capturing Relationships Based on Structure Similarity for Self-Describing Scientific Data Formats

SC21 Proceedings

Capturing Relationships Based on Structure Similarity for Self-Describing Scientific Data Formats

Authors: Chenxu Niu and Wei Zhang (Texas Tech University), Suren Byna (Lawrence Berkeley National Laboratory (LBNL)), and Yong Chen (Texas Tech University)

Abstract: Many scientific data sets use self-describing files for storing data, and files within a data set are often isolated without any definition of relationships among them. Because of the isolated management of scientific data files, locating, assimilating and utilizing relationships for a given query remains a long-standing problem in data discovery. Many relationships are often hidden in complex file structures where different kinds of relationships are not explicitly categorized. To build relationships among scientific data files that are stored in self-describing formats, we propose a relationship-capturing method based on structure similarity determination of the files. Similarity is a fundamental concept in computer science. Crucially, "similarity" is not a typical relationship that reflects correlations of properties in different scientific files. We can use structure similarity determination to capture important relationships. We have evaluated our approach on real-world scientific files. Our approach demonstrates effective relationship capturing and efficient relationship building performance.

Best Poster Finalist (BP): no

Poster: PDF
Poster summary: PDF

Back to Poster Archive Listing