SC21 Proceedings

SC Paper Presentation Archives

  1. 3D Acoustic-Elastic Coupling with Gravity: The Dynamics of the 2018 Palu, Sulawesi Earthquake and Tsunami Lukas Krenz (Technical University Munich); Carsten Uphoff, Thomas Ulrich, and Alice-Agnes Gabriel (Ludwig Maximilian University of Munich); Lauren S. Abrahams and Eric M. Dunham (Stanford University); and Michael Bader (Technical University Munich)

  2. A 400 Trillion-Grid Vlasov Simulation on Fugaku Supercomputer: Large-Scale Distribution of Cosmic Relic Neutrinos in a Six-Dimensional Phase Space Kohji Yoshikawa (University of Tsukuba); Satoshi Tanaka (Kyoto University, Japan); and Naoki Yoshida (University of Tokyo)

  3. #COVIDisAirborne: AI-Enabled Multiscale Computational Microscopy of Delta SARS-CoV-2 in a Respiratory Aerosol Team #COVIDisAirborne (University of California, San Diego)

  4. Accelerating All-Electron Ab Initio Simulation of Raman Spectra for Biological Systems Honghui Shang (Institute of Computing Technology, Chinese Academy of Sciences); Fang Li (National Supercomputing Center in Wuxi); Yunquan Zhang and Ying Liu (Institute of Computing Technology, Chinese Academy of Sciences); Libo Zhang (National Supercomputing Center in Wuxi); Mingchuan Wu and Yangjun Wu (Institute of Computing Technology, Chinese Academy of Sciences); Di Wei (Tsinghua University, China); Huimin Cui (Institute of Computing Technology, Chinese Academy of Sciences); Xin Liu (National Supercomputing Center in Wuxi); Fei Wang (Tsinghua University, China); Yuxi Ye and Yingxiang Gao (Institute of Computing Technology, Chinese Academy of Sciences); Shuang Ni (Laser Fusion Research Center, China Academy of Engineering Physics); Xin Chen (National Supercomputing Center in Wuxi); and Dexun Chen (Tsinghua University, China)

  5. Accelerating Applications using Edge Tensor Processing Units Kuan-Chieh Hsu and Hung-Wei Tseng (University of California, Riverside)

  6. Accelerating Bandwidth-Bound Deep Learning Inference with Main-Memory Accelerators Benjamin Y. Cho (University of Texas, Advanced Micro Devices (AMD) Inc) and Jeageun Jung and Mattan Erez (University of Texas)

  7. Accelerating Large Scale de Novo Metagenome Assembly Using GPUs Muaaz Gul Awan, Steven Hofmeyr, Rob Egan, Nan Ding, Aydin Buluc, Jack Deslippe, and Leonid Oliker (Lawrence Berkeley National Laboratory (LBNL)) and Katherine Yelick (University of California, Berkeley, Lawrence Berkeley National Laboratory (LBNL))

  8. Accelerating XOR-Based Erasure Coding Using Program Optimization Techniques Yuya Uezato (Dwango Ltd, Japan)

  9. AgEBO-Tabular: Joint Neural Architecture and Hyperparameter Search with Autotuned Data-Parallel Training for Tabular Data Romain Egele (Ecole Polytechnique, France; Argonne National Laboratory (ANL)); Prasanna Balaprakash (Argonne National Laboratory (ANL)); Isabelle Guyon (National Institute for Research in Computer Science and Automation (Inria), France; University of Paris-Saclay); Venkatram Vishwanath, Fangfang Xia, and Rick Stevens (Argonne National Laboratory (ANL)); and Zhengying Liu (National Institute for Research in Computer Science and Automation (Inria), France)

  10. Anton 3: Twenty Microseconds of Molecular Dynamics Simulation Before Lunch David E. Shaw, Mark Moraes, and Team D. E. Shaw Research (DE Shaw Research LLC)

  11. APNN-TC: Accelerating Arbitrary Precision Neural Networks on Ampere GPU Tensor Cores Boyuan Feng and Yuke Wang (University of California, Santa Barbara); Tong Geng and Ang Li (Pacific Northwest National Laboratory (PNNL)); and Yufei Ding (University of California, Santa Barbara)

  12. Arithmetic-Intensity-Guided Fault Tolerance for Neural Network Inference on GPUs Jack Kosaian and K. V. Rashmi (Carnegie Mellon University)

  13. BAASH: Lightweight, Efficient and Reliable Blockchain-As-A-Service for HPC Systems Abdullah Al Mamun, Feng Yan, and Dongfang Zhao (University of Nevada, Reno)

  14. Billion Atom Molecular Dynamics Simulations of Carbon at Extreme Conditions and Experimental Time and Length Scales Kien Nguyen Cong and Jonathan T. Willman (University of South Florida); Stan G. Moore (Sandia National Laboratories); Anatoly B. Belonoshko (KTH Royal Institute of Technology, Sweden); Rahulkumar Gayatri (Lawrence Berkeley National Laboratory (LBNL)); Evan Weinberg (NVIDIA Corporation); Mitchell A. Wood and Aidan P. Thompson (Sandia National Laboratories); and Ivan I. Oleynik (University of South Florida)

  15. Bootstrapping In-Situ Workflow Auto-Tuning via Combining Performance Models of Component Applications Tong Shu (Southern Illinois University); Yanfei Guo and Justin Wozniak (Argonne National Laboratory (ANL)); Xiaoning Ding (New Jersey Institute of Technology); Ian Foster (Argonne National Laboratory (ANL), University of Chicago); and Tahsin Kurc (Stony Brook University)

  16. CAKE: Matrix Multiplication Using Constant-Bandwidth Blocks H.T. Kung, Vikas Natesh, and Andrew Sabot (Harvard University)

  17. Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters Qinghao Hu (Nanyang Technological University, Singapore; School of Computer Science and Engineering); Peng Sun and Shengen Yan (SenseTime); and Yonggang Wen and Tianwei Zhang (Nanyang Technological University, Singapore; School of Computer Science and Engineering)

  18. Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines Shigang Li and Torsten Hoefler (ETH Zürich)

  19. Clairvoyant Prefetching for Distributed Machine Learning I/O Nikoli Dryden (ETH Zürich, Lawrence Livermore National Laboratory) and Roman Böhringer, Tal Ben-Nun, and Torsten Hoefler (ETH Zürich)

  20. Closing the “Quantum Supremacy” Gap: Achieving Real-Time Simulation of a Random Quantum Circuit Using a New Sunway Supercomputer Haohuan Fu (Tsinghua University, China) and Team SWQSIM (Various)

  21. cuTS: Scaling Subgraph Isomorphism on Distributed Multi-GPU Systems Using Trie Based Data Structure Lizhi Xiang (Washington State University), Ariful Khan (Pacific Northwest National Laboratory (PNNL)), Edoardo Serra (Boise State University), Mahantesh M. Halappanavar (Pacific Northwest National Laboratory (PNNL)), and Aravind Sukumaran-Rajam (Washington State University)

  22. Cuttlefish: Library for Achieving Energy Efficiency in Multicore Parallel Programs Sunil Kumar, Akshat Gupta, and Vivek Kumar (Indraprastha Institute of Information Technology (IIIT), Delhi) and Sridutt Bhalachandra (Lawrence Berkeley National Laboratory (LBNL))

  23. Data-Driven Scalable Pipeline Using National Agent-Based Models for Real-Time Pandemic Response and Decision Support Parantapa Bhattacharya, Jiangzhuo Chen, Stefan Hoops, Dustin Machi, Madhav Marathe, and Team University of Virginia (University of Virginia)

  24. DeltaFS: A Scalable No-Ground-Truth Filesystem For Massively-Parallel Computing Qing Zheng, Charles D. Cranor, Gregory R. Ganger, Garth A. Gibson, and George Amvrosiadis (Carnegie Mellon University) and Bradley W. Settlemyer and Gary A. Grider (Los Alamos National Laboratory)

  25. Digital Transformation of Droplet/Aerosol Infection Risk Assessment Realized on Fugaku for the Fight against COVID-19 Kazuto Ando, Rahul Bale, ChungGang Li, Satoshi Matsuoka, Keiji Onishi, and Makoto Tsubokura (RIKEN Center for Computational Science (R-CCS))

  26. Discovering and Balancing Fundamental Cycles in Large Signed Graphs Ghadeer Alabandi, Jelena Tešić, Lucas Rusnak, and Martin Burtscher (Texas State University)

  27. DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks Vasimuddin Md, Sanchit Misra, Guixiang Ma, Ramanarayan Mohanty, Evangelos Georganas, Alexander Heinecke, Dhiraj Kalamkar, Nesreen K. Ahmed, and Sasikanth Avancha (Intel Corporation)

  28. Distributed Multigrid Neural Solver on Megavoxel Domains Aditya Balu (Iowa State University), Sergio Botelho (RocketML Inc), Biswajit Khara (Iowa State University), Vinay Rao (RocketML Inc), Soumik Sarkar (Iowa State University), Chinmay Hegde (New York University (NYU)), Adarsh Krishnamurthy (Iowa State University), Santi Adavani (RocketML Inc), and Baskar Ganapathysubramanian (Iowa State University)

  29. Distributed Quantum Computing with QMPI Thomas Häner and Damian S. Steiger (Microsoft Corporation), Torsten Hoefler (ETH Zürich), and Matthias Troyer (Microsoft Corporation)

  30. Dr. Top-k: Delegate-Centric Top-k Computation on GPUs Anil Gaihre (Stevens Institute of Technology); Da Zheng (Johns Hopkins University); Scott Weitze (Stevens Institute of Technology); Lingda Li (Brookhaven National Laboratory); Shuaiwen Leon Song (University of Sydney, University of Washington); Caiwen Ding (University of Connecticut); Xiaoye S. Li (Lawrence Berkeley National Laboratory (LBNL)); and Hang Liu (Stevens Institute of Technology)

  31. Edge-Based Hyperdimensional Learning System with Brain-Like Neural Adaptation Zhuowen Zou (University of California, San Diego); Yeseong Kim (Daegu Institue of Science and Technology, South Korea); Farhad Imani (University of Connecticut); Haleh Alimohamadi (University of California, San Diego); Rosario Cammarota (Intel Labs); and Mohsen Imani (University of California, Irvine)

  32. Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM Deepak Narayanan (Stanford University); Mohammad Shoeybi, Jared Casper, Patrick LeGresley, Mostofa Patwary, Vijay Anand Korthikanti, Dmitri Vainbrand, Prethvi Kashinkunti, Julie Bernauer, and Bryan Catanzaro (NVIDIA Corporation); Amar Phanishayee (Microsoft Research); and Matei Zaharia (Stanford University)

  33. Efficient Scaling of Dynamic Graph Neural Networks Venkatesan T. Chakaravarthy, Shivmaran S. Pandian, Saurabh Raje, and Yogish Sabharwal (IBM Research) and Toyotaro Suzumura and Shashanka Ubaru (IBM TJ Watson Research Center)

  34. Efficient Tensor Core-Based GPU Kernels for Structured Sparsity Under Reduced Precision Zhaodong Chen, Zheng Qu, Liu Liu, Yufei Ding, and Yuan Xie (University of California, Santa Barbara)

  35. ElGA: Elastic and Scalable Dynamic Graph Analysis Kasimir Gabert (Georgia Institute of Technology, Sandia National Laboratories); Kaan Sancak and M. Yusuf Ozkaya (Georgia Institute of Technology); Ali Pinar (Sandia National Laboratories); and Umit V. Catalyurek (Amazon Web Services, Georgia Institute of Technology)

  36. Empirical Evaluation of Circuit Approximations on Noisy Quantum Devices Ellis Wilson and Frank Mueller (North Carolina State University) and Lindsay Bassman and Costin Iancu (Lawrence Berkeley National Laboratory (LBNL))

  37. Enable Simultaneous DNN Services Based on Deterministic Operator Overlap and Precise Latency Prediction Weihao Cui, Han Zhao, and Quan Chen (Shanghai Jiao Tong University); Ningxin Zheng (Microsoft Research Asia); Jingwen Leng and Jieru Zhao (Shanghai Jiao Tong University); Zhuo Song, Tao Ma, and Yong Yang (Alibaba Cloud); and Chao Li and Minyi Guo (Shanghai Jiao Tong University)

  38. Enabling and Scaling the HPCG Benchmark on the Newest Generation Sunway Supercomputer with 42 Million Heterogeneous Cores Qianchao Zhu (Center for Data Science, Peking University); Hao Luo (School of Mathematical Sciences, Peking University); Chao Yang (School of Mathematical Sciences, Peking University; National Engineering Laboratory for Big Data Analysis and Applications, Peking University); Mingshuo Ding (School of Electronics Engineering and Computer Science, Peking University); and Wanwang Yin and Xinhui Yuan (National Research Center of Parallel Computer Engineering and Technology, China)

  39. Enabling Large-Scale Correlated Electronic Structure Calculations: Scaling the RI-MP2 Method on Summit Giuseppe Barca (Australian National University); Jorge Gálvez Vallejo, David Poole, and Melisa Alkan (Iowa State University); Ryan Stocks (Australian National University); Alistair Rendell (Flinders University, Australia); and Mark Gordon (Iowa State University)

  40. Error-Controlled, Progressive, and Adaptable Retrieval of Scientific Data with Multilevel Decomposition Xin Liang (Missouri University of Science and Technology); Qian Gong, Jieyang Chen, Ben Whitney, and Lipeng Wan (Oak Ridge National Laboratory (ORNL)); Qing Liu (New Jersey Institute of Technology); and David Pugmire, Rick Archibald, Norbert Podhorszki, and Scott Klasky (Oak Ridge National Laboratory (ORNL))

  41. ET: Re-Thinking Self-Attention for Transformer Models on GPUs Shiyang Chen (Stevens Institute of Technology), Shaoyi Huang (University of Connecticut), Santosh Pandey (Stevens Institute of Technology), Bingbing Li (University of Connecticut), Guang R. Gao and Long Zheng (University of Delaware), Caiwen Ding (University of Connecticut), and Hang Liu (Stevens Institute of Technology)

  42. Exploiting User Activeness for Data Retention in HPC Systems Wei Zhang (Texas Tech University), Suren Byna (Lawrence Berkeley National Laboratory (LBNL)), Hyogi Sim (Virginia Tech), SangKeun Lee (Oak Ridge National Laboratory (ORNL)), Sudharshan Vazhkudai (Micron Technology Inc), and Yong Chen (Texas Tech University)

  43. Extreme-Scale Ab Initio Quantum Raman Spectra Simulations on the Leadership HPC System in China Honghui Shang (Institute of Computing Technology, Chinese Academy of Sciences); Fang Li (National Supercomputing Center in Wuxi); Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); Libo Zhang (National Supercomputing Center in Wuxi); You Fu (Shangdong University of Science and Technology); Yingxiang Gao and Yangjun Wu (Institute of Computing Technology, Chinese Academy of Sciences); Xiaohui Duan and Rongfen Lin (Tsinghua University, China); Xin Lui (National Supercomputing Center in Wuxi); Ying Liu (Institute of Computing Technology, Chinese Academy of Sciences); and Dexun Chen (Tsinghua University, China)

  44. FastZ: Accelerating Gapped Whole Genome Alignment on GPUs Sree Charan Gundabolu, T. N. Vijaykumar, and Mithuna Thottethodi (Purdue University)

  45. FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers Zheng Chai and Yujing Chen (George Mason University); Ali Anwar (IBM Research, Almaden); Liang Zhao (Emory University); and Yue Cheng and Huzefa Rangwala (George Mason University)

  46. FEP-Based Large-Scale Virtual Screening for Effective Drug Discovery against COVID-19 Zhe Li (Sun Yat-sen University, Guangzhou, China); Chengkun Wu and Yishui Li (State Key Laboratory of High Performance Computing, Changsha); Runduo Liu (Sun Yat-sen University, Guangzhou, China); Kai Lu, Ruibo Wang, Jie Liu, and Chunye Gong (State Key Laboratory of High Performance Computing, Changsha); Canqun Yang (National Supercomputing Center, Tianjin); Xin Wang (Ocean University of China, Qingdao); Chang-Guo Zhan (University of Kentucky); and Hai-Bin Luo (Sun Yat-sen University, Guangzhou, China)

  47. Flare: Flexible In-Network Allreduce Daniele De Sensi, Salvatore Di Girolamo, Saleh Ashkboos, Shigang Li, and Torsten Hoefler (ETH Zürich)

  48. G-SEPM: Building an Accurate and Efficient Soft Error Prediction Model for GPGPUs Hengshan Yue and Xiaohui Wei (Jilin University, China); Guangli Li (Institute of Computing Technology, Chinese Academy of Sciences); and Jianpeng Zhao, Nan Jiang, and Jingweijia Tan (Jilin University, China)

  49. Generalizable Coordination of Large Multiscale Ensembles: Challenges and Learnings at Scale Harsh Bhatia, Francesco Di Natale, Joseph Y. Moon, Xiaohua Zhang, Joseph R. Chavez, and Fikret Aydin (Lawrence Livermore National Laboratory); Chris Stanley (Oak Ridge National Laboratory (ORNL)); Tomas Oppelstrup (Lawrence Livermore National Laboratory); Chris Neale (Los Alamos National Laboratory); Sara Kokkila Schumacher (IBM TJ Watson Research Center); Dong Ahn, Stephen Herbein, and Timothy S. Carpenter (Lawrence Livermore National Laboratory); Sandrasegaram Gnanakaran (Los Alamos National Laboratory); and Peer-Timo Bremer, James N. Glosli, Felice C. Lightstone, and Helgi I. Ingólfsson (Lawrence Livermore National Laboratory)

  50. Hardware Acceleration of Tensor-Structured Multilevel Ewald Summation Method on MDGRAPE-4A, a Special-Purpose Computer System for Molecular Dynamics Simulations Gentaro Morimoto and Yohei Koyama (RIKEN Center for Biosystems Dynamics Research); Hao Zhang (Gusu Laboratory of Materials, Suzhou, China); and Teruhisa Komatsu, Yousuke Ohno, Keigo Nishida, Itta Ohmura, Hiroshi Koyama, and Makoto Taiji (RIKEN Center for Biosystems Dynamics Research)

  51. Hardware-Supported Remote Persistence for Distributed Persistent Memory Zhuohui Duan, Haodi Lu, Haikun Liu, Xiaofei Liao, Hai Jin, Yu Zhang, and Song Wu (Huazhong University of Science and Technology)

  52. HatRPC: Hint-Accelerated Thrift RPC over RDMA Tianxi Li and Haiyang Shi (Ohio State University) and Xiaoyi Lu (University of California, Merced)

  53. The Hidden Cost of the Edge: A Performance Comparison of Edge and Cloud Latencies Ahmed Ali-Eldin (Chalmers University of Technology, Sweden; University of Massachusetts, Amherst) and Bin Wang and Prashant Shenoy (University of Massachusetts, Amherst)

  54. High Performance Uncertainty Quantification with Parallelized Multilevel Markov Chain Monte Carlo Linus Seelinger (Heidelberg University); Anne Reinarz (Durham University, England); Leonhard Rannabauer and Michael Bader (Technical University Munich); and Peter Bastian and Robert Scheichl (Heidelberg University)

  55. High-Throughput Virtual Screening of Small Molecule Inhibitors for SARS-CoV-2 Protein Targets with Deep Fusion Models Garrett A. Stevenson, Derek Jones, Hyojin Kim, W.F. Drew Bennett, Brian J. Bennion, Monica Borucki, Feliza Bourguet, Aidan Epstein, and Magdalena Franco (Lawrence Livermore National Laboratory); Brooke Harmon (Sandia National Laboratories); Stewart He (Lawrence Livermore National Laboratory); Max P. Katz (NVIDIA Corporation); Daniel Kirshner, Victoria Lao, Edmond Y. Lau, Jacky Lo, and Kevin McLoughlin (Lawrence Livermore National Laboratory); Richard Mosesso (Sandia National Laboratories); Deepa K. Murugesh (Lawrence Livermore National Laboratory); Oscar A. Negrete (Sandia National Laboratories); Edwin A. Saada and Brent Segelke (Lawrence Livermore National Laboratory); Maxwell Stefan (Sandia National Laboratories); and Marisa W. Torres, Dina Weilhammer, Sergio Wong, Yue Yang, Adam Zemla, Xiaohua Zhang, Fangqiang Zhu, Felice C. Lightstone, and Jonathan E. Allen (Lawrence Livermore National Laboratory)

  56. HPAC: Evaluating Approximate Computing Techniques on HPC OpenMP Applications Konstantinos Parasyris, Giorgis Georgakoudis, Harshitha Menon, James Diffenderfer, Ignacio Laguna, Daniel Osei-Kuffuor, and Markus Schordan (Lawrence Livermore National Laboratory)

  57. Hybrid, Scalable, Trace-Driven Performance Modeling of GPGPUs Yehia Arafa (New Mexico State University); Abdel-Hameed Badawy (New Mexico State University, Los Alamos National Laboratory); Ammar ElWazir and Atanu Barai (New Mexico State University); Ali Eker (State University of New York at Binghamton); Gopinath Chennupati (Amazon Alexa); and Nandakishore Santhi and Stephan Eidenbenz (Los Alamos National Laboratory)

  58. In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated Computing Tyler Allen and Rong Ge (Clemson University)

  59. Index Launches: Scalable, Flexible Representation of Parallel Task Groups Rupanshu Soi (Birla Institute of Technology and Science Pilani, Hyderabad Campus); Michael Bauer, Sean Treichler, Manolis Papadakis, and Wonchan Lee (NVIDIA Corporation); Patrick McCormick (Los Alamos National Laboratory); Alex Aiken (Stanford University); and Elliott Slaughter (SLAC National Accelerator Laboratory)

  60. Intelligent Resolution: Integrating Cryo-EM with AI-Driven Multi-Resolution Simulations to Observe the SARS-CoV-2 Replication-Transcription Machinery in Action Anda Trifan and Defne Gorgun (University of Illinois), Zongyi Li (California Institute of Technology), Alexander Brace (University of Chicago), Maxim Zvyagin and Heng Ma (Argonne National Laboratory (ANL)), Anima Anandkumar (California Institute of Technology), Venkatram Vishwanath (Argonne National Laboratory (ANL)), John E. Stone (University of Illinois), Sarah A. Harris (University of Leeds), and Arvind Ramanathan and Team Intelligent Resolution (Argonne National Laboratory (ANL))

  61. KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks J. Gregory Pauloski (University of Chicago); Qi Huang (University of Texas); Lei Huang (Texas Advanced Computing Center (TACC)); Shivaram Venkataraman (University of Wisconsin, Madison); Kyle Chard and Ian Foster (University of Chicago, Argonne National Laboratory (ANL)); and Zhao Zhang (Texas Advanced Computing Center (TACC))

  62. Krill: A Compiler and Runtime System for Concurrent Graph Processing Hongzheng Chen, Minghua Shen, Nong Xiao, and Yutong Lu (Sun Yat-sen University, Guangzhou, China)

  63. Language Models for the Prediction of SARS-CoV-2 Inhibitors Andrew E. Blanchard, John Gounley, Debsindhu Bhowmik, Mayanka Chandra Shekar, Isaac Lyngaas, Shang Gao, Junqi Yin, Aristeidis Tsaris, Feiyi Wang, and Jens Glaser (Oak Ridge National Laboratory (ORNL))

  64. LCCG: A Locality-Centric Hardware Accelerator for High Throughput of Concurrent Graph Processing Jin Zhao, Yu Zhang, and Xiaofei Liao (Huazhong University of Science and Technology); Ligang He (University of Warwick); Bingsheng He (National University of Singapore); and Hai Jin and Haikun Liu (Huazhong University of Science and Technology)

  65. LibShalom: Optimizing Small and Irregular-Shaped Matrix Multiplications on ARMv8 Multi-Cores Weiling Yang, Jianbin Fang, Dezun Dong, and Xing Su (National University of Defense Technology (NUDT), China) and Zheng Wang (University of Leeds)

  66. Linux vs. Lightweight Multi-Kernels for High-Performance Computing: Experiences at Pre-Exascale Balazs Gerofi (RIKEN); Kohei Tarumizu, Lei Zhang, and Takayuki Okamoto (Fujitsu Ltd); Masamichi Takagi (RIKEN); Shinji Sumimoto (Fujitsu Ltd); and Yutaka Ishikawa (RIKEN)

  67. LMFF: Efficient and Scalable Layered Materials Force Field on Heterogeneous Many-Core Processors Ping Gao (Shandong University, National Supercomputing Center in Wuxi); Xiaohui Duan (Tsinghua University, China; National Supercomputing Center in Wuxi); Jiaxu Guo (Jilin University, China; National Supercomputing Center in Wuxi); Jin Wang (Tsinghua University, China); Zhenya Song (First Institute of Oceanography and Key Laboratory of Marine Science and Numerical Modeling, Ministry of Natural Resources); Lizhen Cui and Xiangxu Meng (Shandong University); Xin Liu (National Supercomputing Center in Wuxi); Wusheng Zhang (Alkan University, China; National Supercomputing Center in Wuxi); Ming Ma (Tsinghua University, China); Guohui Li (Dalian Institute of Chemical Physics, Chinese Academy of Sciences); Dexun Chen, Haohuan Fu, and Wei Xue (Tsinghua University, China; National Supercomputing Center in Wuxi); Weiguo Liu (Shandong University, National Supercomputing Center in Wuxi); and Guangwen Yang (Tsinghua University, China; National Supercomputing Center in Wuxi)

  68. LogECMem: Coupling Erasure-Coded In-Memory Key-Value Stores with Parity Logging Liangfeng Cheng, Yuchong Hu, Zhaokang Ke, Jia Xu, Qiaori Yao, and Dan Feng (Huazhong University of Science and Technology) and Weichun Wang and Wei Chen (Hikvision LTD, China)

  69. Lunule: An Agile and Judicious Metadata Load Balancer for CephFS Yiduo Wang, Cheng Li, Xinyang Shao, and Youxu Chen (University of Science and Technology of China (USTC)); Feng Yan (University of Nevada, Reno); and Yinlong Xu (University of Science and Technology of China (USTC))

  70. MAPA: Multi-Accelerator Pattern Allocation Policy for Multi-Tenant GPU Servers Kiran Ranganath (University of California, Riverside); Joshua D. Suetterlein and Joseph Manzano (Pacific Northwest National Laboratory (PNNL)); Shuaiwen Leon Song (University of Sydney); and Daniel Wong (University of California, Riverside)

  71. Meeting the Real-Time Challenges of Ground-Based Telescopes Using Low-Rank Matrix Computations Hatem Ltaief (King Abdullah University of Science and Technology (KAUST)); Jesse Cranney (Australian National University); Damien Gratadour (Australian National University, Paris Observatory); Yuxi Hong (King Abdullah University of Science and Technology (KAUST)); Laurent Gatineau (NEC Corporation); and David Keyes (King Abdullah University of Science and Technology (KAUST))

  72. Minimizing Privilege for Building HPC Containers Reid Priedhorsky (Los Alamos National Laboratory); R. Shane Canon (National Energy Research Scientific Computing Center (NERSC), Lawrence Berkeley National Laboratory (LBNL)); Timothy Randles (Los Alamos National Laboratory); and Andrew J. Younge (Sandia National Laboratories)

  73. ndzip-gpu: Efficient Lossless Compression of Scientific Floating-Point Data on GPUs Fabian Knorr, Peter Thoman, and Thomas Fahringer (University of Innsbruck)

  74. A Next-Generation Discontinuous Galerkin Fluid Dynamics Solver with Application to High-Resolution Lung Airflow Simulations Martin Kronbichler (Technical University Munich; Uppsala University, Sweden); Niklas Fehn (Leibniz Supercomputing Centre); Peter Munch (Technical University Munich, Helmholtz-Zentrum Geesthacht); Maximilian Bergbauer (Technical University Munich); Karl-Robert Wichmann (Technical University Munich, Ebenbuild GmbH); Carolin Geitner (Technical University Munich); Momme Allalen (Leibniz Supercomputing Centre); and Martin Schulz and Wolfgang A. Wall (Technical University Munich)

  75. Non-Recurring Engineering (NRE) Best Practices: A Case Study with the NERSC/NVIDIA OpenMP Contract Christopher Daley (Lawrence Berkeley National Laboratory (LBNL)); Annemarie Southwell (NVIDIA Corporation); Rahulkumar Gayatri (Lawrence Berkeley National Laboratory (LBNL)); Scott Biersdorff, Craig Toepfer, and Guray Ozen (NVIDIA Corporation); and Nicholas Wright (Lawrence Berkeley National Laboratory (LBNL))

  76. On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations Grzegorz Kwasniewski (ETH Zürich); Marko Kabić (Swiss National Supercomputing Centre (CSCS), ETH Zürich); Tal Ben-Nun, Alexandros Nikolaos Ziogas, Jens Eirik Saethre, André Gaillard, Timo Schneider, and Maciej Besta (ETH Zürich); Anton Kozhevnikov and Joost VandeVondele (Swiss National Supercomputing Centre (CSCS), ETH Zürich); and Torsten Hoefler (ETH Zürich)

  77. Online Evolutionary Batch Size Orchestration for Scheduling Deep Learning Workloads in GPU Clusters Zhengda Bian (National University of Singapore); Shenggui Li (Nanyang Technological University, Singapore); Wei Wang (ByteDance Ltd); and Yang You (National University of Singapore)

  78. Online Optimization of File Transfers in High-Speed Networks Md Arifuzzaman and Engin Arslan (University of Nevada, Reno)

  79. Overcoming Barriers to Scalability in Variational Quantum Monte Carlo Tianchen Zhao, Brian Chen, and Saibal De (University of Michigan); James Stokes (Flatiron Institute); and Shravan Veerapaneni (University of Michigan, Flatiron Institute)

  80. PAGANI: A Parallel Adaptive GPU Algorithm for Numerical Integration Ioannis Sakiotis (Old Dominion University); Kamesh Arumugam (NVIDIA Corporation); Marc Paterno (Fermi National Accelerator Laboratory); and Desh Ranjan, Balsa Terzic, and Mohammad Zubair (Old Dominion University)

  81. Parallel Construction of Module Networks Ankit Srivastava, Sriram Chockalingam, Maneesha Aluru, and Srinivas Aluru (Georgia Institute of Technology)

  82. Paths to OpenMP in the Kernel Jiacheng Ma, Wenyi Wang, Aaron Nelson, Michael Cuevas, and Brian Homerding (Northwestern University); Conghao Liu (Illinois Institute of Technology); Zhen Huang and Simone Campanoni (Northwestern University); Kyle Hale (Illinois Institute of Technology); and Peter Dinda (Northwestern University)

  83. Peppa-X: Finding Program Test Inputs to Bound Silent Data Corruption Vulnerability in HPC Applications Md Hasanur Rahman and Aabid Shamji (University of Iowa), Shengjian Guo (Baidu Security), and Guanpeng Li (University of Iowa)

  84. Pilgrim: Scalable and (Near) Lossless MPI Tracing Chen Wang (University of Illinois), Pavan Balaji (Facebook), and Marc Snir (University of Illinois)

  85. Pinpointing Crash-Consistency Bugs in the HPC I/O Stack: A Cross-Layer Approach Jinghan Sun, Jian Huang, and Marc Snir (University of Illinois)

  86. Preparing an Incompressible-Flow Fluid Dynamics Code for Exascale-Class Wind Energy Simulations Paul Mullowney (National Renewable Energy Laboratory (NREL)); Ruipeng Li (Lawrence Livermore National Laboratory); Stephen Thomas, Shreyas Ananthan, Ashesh Sharma, and Jon Rood (National Renewable Energy Laboratory (NREL)); Alan B. Williams (Sandia National Laboratories); and Michael Sprague (National Renewable Energy Laboratory (NREL))

  87. Productivity, Portability, Performance: Data-Centric Python Alexandos Nikolaos Ziogas, Timo Schneider, Tal Ben-Nun, Alexandru Calotoiu, Tiziano De Matteis, Johannes de Fine Licht, Luca Lavarini, and Torsten Hoefler (ETH Zürich)

  88. Reducing Redundancy in Data Organization and Arithmetic Calculation for Stencil Computations Kun Li (Institute of Computing Technology, Chinese Academy of Sciences; Chinese Academy of Sciences); Liang Yuan and Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); and Yue Yue (Institute of Computing Technology, Chinese Academy of Sciences; Chinese Academy of Sciences)

  89. Representation of Women in HPC Conferences Eitan Frachtenberg and Rhody Kaner (Reed College)

  90. Resilient Error-Bounded Lossy Compressor for Data Transfer Sihuan Li (University of California, Riverside); Sheng Di (Argonne National Laboratory (ANL)); Kai Zhao (University of California, Riverside); Xin Liang (Missouri University of Science and Technology); Zizhong Chen (University of California, Riverside); and Franck Cappello (Argonne National Laboratory (ANL))

  91. Revealing Power, Energy, and Thermal Dynamics of a 200PF Pre-Exascale Supercomputer Woong Shin, Vladyslav Oles, Ahmad Maroof Karimi, J. Austin Ellis, and Feiyi Wang (Oak Ridge National Laboratory (ORNL))

  92. Reverse-Mode Automatic Differentiation and Optimization of GPU Kernels via Enzyme William S. Moses and Valentin Churavy (Massachusetts Institute of Technology (MIT)); Ludger Paehler (Technical University Munich); and Jan Hückelheim, Sri Hari Krishna Narayanan, Michel Schanen, and Johannes Doerfert (Argonne National Laboratory (ANL))

  93. Ribbon: Cost-Effective and QoS-Aware Deep Learning Model Inference Using a Diverse Pool of Cloud Computing Instances Baolin Li, Rohan Roy, and Tirthak Patel (Northeastern University); Vijay Gadepally and Karen Gettings (Massachusetts Institute of Technology (MIT) Lincoln Laboratory); and Devesh Tiwari (Northeastern University)

  94. Scalable Adaptive PDE Solvers in Arbitrary Domains Kumar Saurabh (Iowa State University); Masado Ishii and Milinda Fernando (University of Utah); Boshun Gao, Kendrick Tan, Ming-Chen Hsu, and Adarsh Krishnamurthy (Iowa State University); Hari Sundar (University of Utah); and Baskar Ganapathysubramanian (Iowa State University)

  95. Scalable FBP Decomposition for Cone-Beam CT Reconstruction Peng Chen and Mohamed Wahib (National Institute of Advanced Industrial Science and Technology (AIST), RIKEN Center for Computational Science (R-CCS)); Xiao Wang (Oak Ridge National Laboratory (ORNL), Boston Children's Hospital); Takahiro Hirofuchi and Hirotaka Ogawa (National Institute of Advanced Industrial Science and Technology (AIST)); Ander Biguri (Institute of Nuclear Medicine, University College London); Richard Boardman and Thomas Blumensath (University of Southampton, 𝜇-VIS X-Ray Imaging Centre); and Satoshi Matsuoka (RIKEN Center for Computational Science (R-CCS), Tokyo Institute of Technology)

  96. SEEC: Stochastic Escape Express Channel Mayank Parasar (Georgia Institute of Technology); Natalie Enright Jerger (University of Toronto); Paul Gratz (Texas A&M University); Joshua San Miguel (University of Wisconsin, Madison); and Tushar Krishna (Georgia Institute of Technology)

  97. Simurgh: A Fully Decentralized and Secure NVMM User Space File System Nafiseh Moti, Frederic Schimmelpfennig, Reza Salkhordeh, and David Klopp (Johannes Gutenberg University Mainz); Toni Cortes (Polytechnic University of Catalonia); Ulrich Rückert (Bielefeld University, Germany); and André Brinkmann (Johannes Gutenberg University Mainz)

  98. Single-Node Partitioned-Memory for Huge Graph Analytics: Cost and Performance Trade-Offs Sayan Ghosh and Nathan Tallent (Pacific Northwest National Laboratory (PNNL)); Marco Minutoli and Mahantesh Halappanavar (Pacific Northwest National Laboratory (PNNL), Washington State University); Ramesh Peri (Facebook); and Ananth Kalyanaraman (Washington State University, Pacific Northwest National Laboratory (PNNL))

  99. STM-Multifrontal QR: Streaming Task Mapping Multifrontal QR Factorization Empowered by GCN Shengle Lin, Wangdong Yang, Haotian Wang, Qinyun Tsai, and Kenli Li (Hunan University)

  100. SV-Sim: Scalable PGAS-Based State Vector Simulation of Quantum Circuits Ang Li and Bo Fang (Pacific Northwest National Laboratory (PNNL)); Christopher Granade, Guen Prawiroatmodjo, Bettina Heim, and Martin Roetteler (Microsoft Corporation); and Sriram Krishnamoorthy (Pacific Northwest National Laboratory (PNNL))

  101. SW_Qsim: A Minimize-Memory Quantum Simulator with High-Performance on a New Sunway Supercomputer Fang Li, Xin Liu, Yong Liu, Pengpeng Zhao, and Yuling Yang (National Supercomputing Center in Wuxi); Honghui Shang (Institute of Computing Technology, Chinese Academy of Sciences); Weizhe Sun, Zhen Wang, and Enming Dong (National Supercomputing Center in Wuxi); and Dexun Chen (Tsinghua University, China)

  102. Symplectic Structure-Preserving Particle-in-Cell Whole-Volume Simulation of Tokamak Plasmas to 111.3 Trillion Particles and 25.7 Billion Grids Jianyuan Xiao (University of Science and Technology of China) and Team SymPIC (Various)

  103. Systematically Inferring I/O Performance Variability by Examining Repetitive Job Behavior Emily Costa and Tirthak Patel (Northeastern University), Benjamin Schwaller and James Brandt (Sandia National Laboratories), and Devesh Tiwari (Northeastern University)

  104. Temporal Vectorization for Stencils Liang Yuan, Hang Cao, Yunquan Zhang, Kun Li, Pengqi Lu, and Yue Yue (Institute of Computing Technology, Chinese Academy of Sciences)

  105. Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads Evangelos Georganas, Dhiraj Kalamkar, Sasikanth Avancha, Menachem Adelman, and Cristina Anderson (Intel Corporation); Alexander Breuer (Friedrich Schiller University Jena, Germany); and Jeremy Bruestle, Narendra Chaudhary, Abhisek Kundu, Denise Kutnick, Frank Laub, Vasimuddin Md, Sanchit Misra, Ramanarayan Mohanty, Hans Pabst, Barukh Ziv, and Alexander Heinecke (Intel Corporation)

  106. TensorKMC: Kinetic Monte Carlo Simulation of 50 Trillion Atoms Driven by Deep Learning on a New Generation of Sunway Supercomputer Honghui Shang (Institute of Computing Technology, Chinese Academy of Sciences); Xin Chen and Xingyu Gao (Institute of Applied Physics and Computational Mathematics); Rongfen Lin (Tsinghua University, China); Lifang Wang (Institute of Applied Physics and Computational Mathematics); Fang Li, Qian Xiao, and Qiang Sun (National Supercomputing Center in Wuxi); Lei Xu (Institute of Computing Technology, Chinese Academy of Sciences); Leilei Zhu (University of Science and Technology of China); Fei Wang (Tsinghua University, China); Yunquan Zhang (Institute of Computing Technology, Chinese Academy of Sciences); and Haifeng Song (Institute of Applied Physics and Computational Mathematics)

  107. TriPoll: Computing Surveys of Triangles in Massive-Scale Temporal Graphs with Metadata Trevor Steil, Tahsin Reza, Keita Iwabuchi, Benjamin W. Priest, Geoffrey Sanders, and Roger Pearce (Lawrence Livermore National Laboratory)

  108. Understanding, Predicting and Scheduling Serverless Workloads Under Partial Interference Laiping Zhao, Yanan Yang, Yiming Li, Xian Zhou, and Keqiu Li (Tianjin University, China)

  109. Whale: Efficient One-to-Many Data Partitioning in RDMA-Assisted Distributed Stream Processing Systems Jei Tan, Hanhua Chen, Yonghui Wang, and Hai Jin (Huazhong University of Science and Technology)

  110. ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning Samyam Rajbhandari, Olatunji Ruwase, Jeff Rasley, Shaden Smith, and Yuxiong He (Microsoft Corporation)

Back to SC21 Proceedings Archive