Authors: Neeraj Rajesh (Illinois Institute of Technology); Quincey Koziol, Suren Byna, Houjun Tang, and Jean Luca Bez (Lawrence Berkeley National Laboratory (LBNL)); and Anthony Kougkas and Xian-He Sun (Illinois Institute of Technology)
Abstract: Feature reduction is an integral part of data preparation in machine learning. It helps denoise the data and makes it easier to fit the model. Predicting the performance of an application using Darshan counters can be tricky due to the large amount of data available, with not all of them being pertinent to predicting the I/O performance. There exist methods for feature reduction, the most common being recursive feature elimination (RFE). The RFE method aims to correlate the features to a specific data point. We aim to get a subset of features that are able to distinguish between the different applications, then compare the effectiveness of the subset by creating a model to predict I/O performance. We then aim to compare that with a similar model created with all the features and with a subset of features determined using RFE implemented on Scikit Learn.
Best Poster Finalist (BP): no
Poster summary: PDF
Back to Poster Archive Listing