Workshop:LLVM-HPC2021: The Seventh Workshop on the LLVM Compiler Infrastructure in HPC
Authors: Mahesh Lakshminarasimhan (University of Utah), Mahesh Ravishankar (Google LLC), and Mary Hall and Ponnuswamy Sadayappan (University of Utah)
Abstract: We present an optimized code generation for data layout transformations in MLIR. For an input tensor of arbitrary order (dimensionality) and a given index permutation sequence, our code generator synthesizes high-performance vector code for the corresponding transposition with progressive lowering in MLIR. Our code generator currently supports single-threaded transposition with explicit vectorization for double-precision data on AVX2-based processors. We are currently extending this to support parallel code generation and other vector instruction sets, and also integrate this with existing MLIR-based tensor algebra compilers. Performance results show a significant speedup over the existing unoptimized MLIR implementation, Tensorflow, and Eigen, and we achieve performance comparable to the HPTT library.