SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Systematically Inferring I/O Performance Variability by Examining Repetitive Job Behavior


Authors: Emily Costa and Tirthak Patel (Northeastern University), Benjamin Schwaller and James Brandt (Sandia National Laboratories), and Devesh Tiwari (Northeastern University)

Abstract: Monitoring and analyzing I/O behaviors is critical to the efficient utilization of parallel storage systems. Unfortunately, with increasing I/O requirements and resource contention, I/O performance variability is becoming a significant concern. This paper investigates I/O behavior and performance variability on a large-scale high-performance computing (HPC) system using a novel methodology that identifies similarity across jobs from the same application leveraging an I/O characterization tool and then, detects potential I/O performance variability across jobs of the same application. We demonstrate and discuss how our unique methodology can be used to perform temporal and feature analyses to detect interesting I/O performance variability patterns in production HPC systems, and their implications for operating/managing large-scale systems.




Back to Technical Papers Archive Listing