Tensor Processing Primitives: A Programming Abstraction for Efficiency and Portability in Deep Learning Workloads
Event Type
Paper

Machine Learning and Artificial Intelligence
TP
TimeTuesday, 16 November 202111am - 11:30am CST
Location230-231-232
DescriptionDuring the past decade, novel deep learning (DL) algorithms/workloads and hardware have been developed to tackle a wide range of problems. Despite the advances in workload/hardware ecosystems, the programming methodology of DL-systems is stagnant. DL-workloads leverage either highly-optimized, yet platform-specific and inflexible, kernels from DL-libraries, or as for novel operators, reference implementations are built via DL-framework primitives with underwhelming performance. This work introduces the Tensor Processing Primitives (TPP), a programming abstraction striving for efficient, portable implementation of DL-workloads with high productivity. TPPs define a compact, yet versatile set of 2D-tensor operators, which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors. The TPP specification is platform-agnostic, thus code expressed via TPPs is portable, whereas the TPP implementation is highly-optimized and platform-specific. We demonstrate the efficacy of our approach using standalone kernels and end-to-end DL-workloads expressed entirely via TPPs that outperform state-of-the-art implementations on multiple platforms.
Download PDF
Archive view
