ExaMPI Invited Talk: Designing High-Performance and Scalable Middleware for HPC and AI: Challenges and Opportunities
Parallel Programming Languages and Models
TimeSunday, 14 November 20219:01am - 10am CST
DescriptionThis talk will focus on challenges and opportunities in designing middleware for HPC, AI (Deep/Machine Learning), and Data Science for On-premise HPC and Cloud systems with advances in networking and accelerator technologies. For the HPC domain, we will discuss about the challenges in designing runtime environments for MPI+X programming models by taking into account support for multi-core systems (Xeon, ARM and OpenPower), high-performance networks (InfiniBand and RoCE), GPUs (including GPUDirect RDMA), and emerging BlueField-2 DPUs. Features, sample performance numbers and best practices of using MVAPICH2 libraries (http://mvapich.cse.ohio-state.edu)will be presented. For the Deep/Machine Learning domain, we will focus on MPI-driven solutions (http://hidl.cse.ohio-state.edu) to extract performance and scalability for popular Deep Learning frameworks (TensorFlow and PyTorch) and large out-of-core models. Accelerating Deep Learning applications with Bluefield-2 DPUs will also be presented. MPI-driven solutions to accelerate data science applications like Dask (http://hibd.cse.ohio-state.edu) will be highlighted. Finally, we will outline the challenges and experiences in deploying this middleware to the HPC cloud environments for Azure, AWS, and Oracle.