Systematically Inferring I/O Performance Variability by Examining Repetitive Job Behavior
TimeTuesday, 16 November 20214:30pm - 5pm CST
DescriptionMonitoring and analyzing I/O behaviors is critical to the efficient utilization of parallel storage systems. Unfortunately, with increasing I/O requirements and resource contention, I/O performance variability is becoming a significant concern. This paper investigates I/O behavior and performance variability on a large-scale high-performance computing (HPC) system using a novel methodology that identifies similarity across jobs from the same application leveraging an I/O characterization tool and then, detects potential I/O performance variability across jobs of the same application. We demonstrate and discuss how our unique methodology can be used to perform temporal and feature analyses to detect interesting I/O performance variability patterns in production HPC systems, and their implications for operating/managing large-scale systems.