MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Workshop:7th Workshop on Machine Learning in High Performance Environment

Authors: Steven Farrell (Lawrence Berkeley National Laboratory (LBNL)); Murali Emani (Argonne National Laboratory (ANL)); Jacob Balma (Hewlett Packard Enterprise); Lukas Drescher (Swiss National Supercomputing Centre (CSCS)); Aleksandr Drozd (RIKEN Center for Computational Science (R-CCS)); Andreas Fink (Swiss National Supercomputing Centre (CSCS)); Geoffrey Fox (Indiana University); David Kanter (MLCommons); Thorsten Kurth (NVIDIA Corporation); Peter Mattson (Google LLC); Dawei Mu (National Center for Supercomputing Applications (NCSA)); Amit Ruhela (Texas Advanced Computing Center (TACC), University of Texas); Kento Sato (RIKEN Center for Computational Science (R-CCS)); Koichi Shirahata and Tsuguchika Tabaru (Fujitsu Ltd); Aristeidis Tsaris (Oak Ridge National Laboratory (ORNL)); Jan Balewski (Lawrence Berkeley National Laboratory (LBNL)); Ben Cumming (Swiss National Supercomputing Centre (CSCS)); Takumi Danjo (Fujitsu Ltd); Jens Domke and Takaaki Fukai (RIKEN Center for Computational Science (R-CCS)); Naoto Fukumoto and Tatsuya Fukushi (Fujitsu Ltd); Balazs Gerofi (RIKEN Center for Computational Science (R-CCS)); Takumi Honda (Fujitsu Ltd); Toshiyuki Imamura (RIKEN Center for Computational Science (R-CCS)); Akihiko Kasagi and Kentaro Kawakami (Fujitsu Ltd); Shuhei Kudo and Akiyoshi Kuroda (RIKEN Center for Computational Science (R-CCS)); Maxime Martinasso (Swiss National Supercomputing Centre (CSCS)); Satoshi Matsuoka (RIKEN Center for Computational Science (R-CCS)); Henrique Mendonça (Swiss National Supercomputing Centre (CSCS)); Kazuki Minami (RIKEN Center for Computational Science (R-CCS)); Prabhat Ram (Microsoft Corporation); Takashi Sawada (Fujitsu Ltd); Mallikarjun Shankar (Oak Ridge National Laboratory (ORNL)); Tom St. John (Cruise); Akihiro Tabuchi (Fujitsu Ltd); Venkatram Vishwanath (Argonne National Laboratory (ANL)); Mohamed Wahib (National Institute of Advanced Industrial Science and Technology (AIST), Japan; RIKEN Center for Computational Science (R-CCS)); Masafumi Yamazaki (Fujitsu Ltd); and Junqi Yin (Oak Ridge National Laboratory (ORNL))

Abstract: Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf(TM) is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications driven by the MLCommons(TM) Association. We present the results from the first submission round, including a diverse set of some of the world's largest HPC systems, along with a systematic framework for their joint analysis and insights on implementations. Furthermore, we characterize each benchmark with compute, memory and I/O behaviours to parameterize extended roofline performance models.

Website: https://drive.google.com/drive/u/2/folders/1Vyz-Fpe_tEPU1HqeFwOTePXHlCF8wmK1

Back to 7th Workshop on Machine Learning in High Performance Environment Archive Listing

Back to Full Workshop Archive Listing