Parallel SIMD - A Policy Based Solution for Free Speed-Up Using C++ Data-Parallel Types
Cloud and Distributed Computing
Parallel Programming Languages and Models
Parallel Programming Systems
System Software and Runtime Systems
TimeMonday, 15 November 202111:30am - 12pm CST
DescriptionRecent additions to the C++ standard and ongoing standardization efforts aim to add data-parallel types to the C++ standard library. This enables the use of vectorization techniques in existing C++ codes without having to rely on the C++ compiler's abilities to auto-vectorize the code's execution. The integration of the existing parallel algorithms with these new data-parallel types opens up a new way of speeding up existing codes with minimal effort. Today, only very little implementation experience exists for potential data-parallel execution of the standard parallel algorithms. In this paper, we report on experiences and performance analysis results for our implementation of two new data-parallel execution policies usable with HPX's parallel algorithms module: simd and par_simd. We utilize the new experimental implementation of data-parallel types provided by recent versions of the GNU GCC and Clang C++ standard libraries. The benchmark results collected from artificial tests and real-world codes presented in this paper are very promising. Compared to sequenced execution, we report on speed-ups of more than three orders of magnitude when executed using the newly implemented data-parallel execution policy par_simd with HPX's parallel algorithms. We also report that our implementation is performance portable across different compute architectures (x64 -- Intel and AMD, and Arm), using different vectorization technologies (AVX2, AVX512, NEON64, and NEON128).