Architectural Requirements for Deep Learning Workloads in HPC Environments
Parallel Programming Languages and Models
TimeMonday, 15 November 20219:30am - 10am CST
DescriptionScientific machine learning (SciML) promises to have a transformational impact on scientific exploration, by combining state-of-the-art AI methods with the latest generation of supercomputers. To efficiently leverage ML techniques on high-performance computing (HPC) systems, however, it is critical to understand the performance characteristics of the underlying algorithms on modern computational systems. In this work, we present a new methodology for developing a detailed performance understanding of ML benchmarks. To demonstrate our approach we investigate two emerging SciML benchmark applications from cosmology and climate; ComsoFlow and DeepCAM; as well as ResNet-50, a well-known image classification model. We develop and validate performance models that explore the key architectural artifacts, including memory requirements, data reuse and performance efficiency across both single- and multiple-GPU computations. Our methodology focuses on the complexity of data-movement across storage and memory hierarchies, and leverages our performance models to capture key components of runtime execution while highlighting design tradeoffs.