Capturing Relationships Based on Structure Similarity for Self-Describing Scientific Data Formats
TimeThursday, 18 November 20218:30am - 5pm CST
LocationSecond Floor Atrium
DescriptionMany scientific data sets use self-describing files for storing data, and files within a data set are often isolated without any definition of relationships among them. Because of the isolated management of scientific data files, locating, assimilating and utilizing relationships for a given query remains a long-standing problem in data discovery. Many relationships are often hidden in complex file structures where different kinds of relationships are not explicitly categorized. To build relationships among scientific data files that are stored in self-describing formats, we propose a relationship-capturing method based on structure similarity determination of the files. Similarity is a fundamental concept in computer science. Crucially, "similarity" is not a typical relationship that reflects correlations of properties in different scientific files. We can use structure similarity determination to capture important relationships. We have evaluated our approach on real-world scientific files. Our approach demonstrates effective relationship capturing and efficient relationship building performance.