Optimizing Data Layout Transformations in MLIR
Parallel Programming Systems
TimeSunday, 14 November 20212:18pm - 2:25pm CST
DescriptionWe present an optimized code generation for data layout transformations in MLIR. For an input tensor of arbitrary order (dimensionality) and a given index permutation sequence, our code generator synthesizes high-performance vector code for the corresponding transposition with progressive lowering in MLIR. Our code generator currently supports single-threaded transposition with explicit vectorization for double-precision data on AVX2-based processors. We are currently extending this to support parallel code generation and other vector instruction sets, and also integrate this with existing MLIR-based tensor algebra compilers. Performance results show a significant speedup over the existing unoptimized MLIR implementation, Tensorflow, and Eigen, and we achieve performance comparable to the HPTT library.