Workshop:IA^3 2021: 11th Workshops on Irregular Applications: Architectures and Algorithms
Authors: Christopher Stone (National Institute of Aerospace), Aaron Walden (NASA), Mohammad Zubair (Old Dominion University), and Eric Nielson (NASA)
Abstract: Computational performance of the FUN3D unstructured-grid computational fluid dynamics application on GPUs is highly dependent on the efficiency of floating-point atomic updates needed to support the irregular cell-, edge-, and node-based data access patterns in massively parallel GPU environments. We examine several optimization methods to improve GPU throughput on kernels that are dominated by atomic updates on NVIDIA V100/A100 and AMD MI100 GPUs. Optimization on the AMD MI100 GPU was of primary interest since similar hardware will be used in the upcoming Frontier supercomputer. Techniques combining register shuffling and on-chip shared memory were used to transpose and/or aggregate results amongst collaborating threads before atomically updating global memory. These techniques, along with algorithmic optimizations to reduce the update frequency, reduced the run-time of select kernels on the MI100 GPU by a factor of between 2.5 and 6.0 over atomically updating global memory directly.