Understanding Power Variation and its Implications on Performance Optimization on the Cori Supercomputer
Parallel Programming Languages and Models
TimeMonday, 15 November 202112pm - 12:30pm CST
DescriptionPower is increasingly becoming a limiting factor in supercomputing. The performance and scale of future high-performance computing systems will be determined by how efficiently they manage their power budgets. Therefore, any amount of unused power is forsaken performance. Regardless of the processors chosen for a future system, it will be necessary to understand power variation and its implications on performance optimization. In this paper, we identify and quantify the factors that affect the power consumption of the NERSC Cori supercomputer at different levels of the system hierarchy. Our study presents node-level power-performance trade-offs for fundamental computational patterns. We show that I/O activity and load imbalance are common causes of job-level power variation among jobs in Cori’s production workload. We quantitatively attribute system-level power variation to three sources and find that 86% of the variation is due to temporal variation in application behavior over the duration of a job. Furthermore, our analysis reveals that under typical workload conditions, the Cori system’s power budget could accommodate up to 60% more nodes.