HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing

SC21 Proceedings

HPC Ontology: Towards a Unified Ontology for Managing Training Datasets and AI Models for High-Performance Computing

Workshop:7th Workshop on Machine Learning in High Performance Environment

Authors: Chunhua Liao and Pei-Hung Lin (Lawrence Livermore National Laboratory), Gaurav Verma (Stony Brook University), Tristan Vanderbruggen (Lawrence Livermore National Laboratory), Murali Emani (Argonne National Laboratory (ANL)), and Zifan Nan and Xipeng Shen (North Carolina State University)

Abstract: Machine learning (ML) techniques have been widely studied to address various challenges of productively and efficiently running large-scale scientific applications on heterogeneous supercomputers. However, it is extremely difficult to generate, access, and maintain training datasets and AI models to accelerate ML-based research. The Future of Research Communications and e-Scholarship has proposed the FAIR data principles describing Findability, Accessibility, Interoperability, and Reusability. In this paper, we present our ongoing work of designing an ontology for high-performance computing (named HPC ontology) in order to make training datasets and AI models FAIR. Our ontology provides controlled vocabularies, explicit semantics, and formal knowledge representations. Our design uses an extensible two-level pattern, capturing both high-level meta information and low-level data content for software, hardware, experiments, workflows, training datasets, AI models, and so on. Preliminary evaluation shows that HPC ontology is effective to annotate selected data and support a set of SPARQL queries.

Website:

Back to 7th Workshop on Machine Learning in High Performance Environment Archive Listing

Back to Full Workshop Archive Listing