GPU porting of scalable implicit solver with Green's function-based neural networks by OpenACC
Parallel Programming Languages and Models
TimeSunday, 14 November 202110:30am - 11am CST
DescriptionWith the development of diverse computer architectures and diverse HPC applications, it is desirable to make performance portable applications that run on multiple architectures with relatively low development cost. Directive based programming models such as OpenACC have been developed for such purpose, and have been used successfully to port many equation-based HPC applications. As an example of porting of a class of HPC applications comprising both data-analytics methods and equation-based methods, we port an implicit solver with a neural network (NN)-type preconditioner for solving large-scale partial differential equation (PDE)-based problems. The scalable preconditioner is based on the Green's functions reflecting properties of the target PDE, which improves the accuracy and efficiency of using NNs for solving PDE-based problems. By kernel algorithm design suitable for the computer architecture and use of OpenACC, we enabled high performance on recent GPUs with relatively low development cost. Here, 64.4% of FP64 peak was obtained on ABCI's compute nodes equipped with NVIDIA A100 GPUs, leading to 2.54-fold speedup from a highly-tuned GPU implementation of a widely used PDE solver algorithm and 38.9-fold speedup from OpenMP-based CPU implementation running on the same system. Furthermore, 83.4% weak scalability was obtained from 8 to 256 A100 GPUs on the ABCI system, enabling solving large scale problems of up to 25.7 billion degrees-of-freedom with high performance.