Performance and Energy Improvement of the ECP Proxy App SW4lite under Various Workloads
Parallel Programming Languages and Models
System Software and Runtime Systems
TimeSunday, 14 November 20213:31pm - 4pm CST
DescriptionEnergy efficient execution of scientific applications requires insight into how HPC systems affect the performance and energy of the applications. In this paper, we conduct experiments to evaluate the performance of SW4lite under various workloads with two different memory modes on Cray XC40 Theta at Argonne National Laboratory. We use MuMMI and ensemble learning to build the performance-counter-based performance and power models, and we identify performance and power bottlenecks based on the insights from these performance and power models. Then we improve the performance and energy of SW4lite with a focus on memory-centric optimizations and code modifications. The experimental results show that our performance counter-guided application optimization strategies result in up to 26.97% performance improvement and up to 19.44% energy saving for SW4lite with various workloads on up to 16,384 cores.