SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

User-Centric System Fault Identification Using IO500 Benchmark


Workshop:PDSW: Sixth International Parallel Data Systems Workshop

Authors: Radita Liem (RWTH Aachen University) and Gerald Loftstead (Sandia National Laboratories)


Abstract: I/O performance in a multi-user environment is difficult to predict. Users do not know what to expect when running and tuning their application for better I/O performance. We propose to use the IO500 benchmark as a way to guide user expectations on their application's I/O performance and identifying root causes of their I/O problems that might come from the system. Our experiments cover the first step where we manage user expectation with IO500 and provide a mechanism for system fault identification. This work also provides us with information of the tail latency problem that needs to be addressed and granular information about the impact of I/O technique choices (POSIX and MPI-IO).





Back to PDSW: Sixth International Parallel Data Systems Workshop Archive Listing



Back to Full Workshop Archive Listing