Optimization Strategy for a Performance Portable Vlasov Code
Parallel Programming Languages and Models
TimeSunday, 14 November 20213:55pm - 4:20pm CST
DescriptionThis paper presents optimization strategies applied on a kinetic plasma simulation code that makes use of OpenACC/OpenMP directives and Kokkos performance portable framework to run across multiple CPUs and GPUs. We evaluate the impacts of optimizations on multiple hardware platforms: Intel Xeon Skylake, Fujitsu Arm A64FX, and Nvidia Tesla P100 and V100. After the optimizations, the OpenACC/OpenMP version achieved speedups of 1.07 to 1.39. The Kokkos version in turn achieved speedups of 1.00 to 1.33. Since the impact of optimizations under multiple combinations of kernels, devices and parallel implementations is demonstrated, this paper provides a widely available approach to accelerate codes keeping the performance portability. To achieve a good performance on both CPUs and GPUs, Kokkos could be a reasonable choice which offers more flexibility to manage multiple data and loop structures with a single codebase.