Architectural Requirements for Deep Learning Workloads in HPC Environments

SC21 Proceedings

Architectural Requirements for Deep Learning Workloads in HPC Environments

Workshop:PMBS21: The 12th International Workshop on Performance Modeling, Benchmarking and Simulation of High-Performance Computer Systems

Authors: Khaled Ibrahim, Tan Nguyen, Hai Ah Nam, Wahid Bhimji, Steven Farrell, Leonid Oliker, Michael Rowan, Nicholas Wright, and Samuel Williams (Lawrence Berkeley National Laboratory (LBNL))

Abstract: Scientific machine learning (SciML) promises to have a transformational impact on scientific exploration, by combining state-of-the-art AI methods with the latest generation of supercomputers. To efficiently leverage ML techniques on high-performance computing (HPC) systems, however, it is critical to understand the performance characteristics of the underlying algorithms on modern computational systems. In this work, we present a new methodology for developing a detailed performance understanding of ML benchmarks. To demonstrate our approach we investigate two emerging SciML benchmark applications from cosmology and climate; ComsoFlow and DeepCAM; as well as ResNet-50, a well-known image classification model. We develop and validate performance models that explore the key architectural artifacts, including memory requirements, data reuse and performance efficiency across both single- and multiple-GPU computations. Our methodology focuses on the complexity of data-movement across storage and memory hierarchies, and leverages our performance models to capture key components of runtime execution while highlighting design tradeoffs.

Back to PMBS21: The 12th International Workshop on Performance Modeling, Benchmarking and Simulation of High-Performance Computer Systems Archive Listing

Back to Full Workshop Archive Listing