SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

How GPT-3 is Spearheading the Fourth Industrial Revolution


Authors: Kevin Jorissen (Oracle), Jonas Andrulis (Aleph Alpha GmbH), Joey Conway (NVIDIA Corporation)

Abstract: Scaling a large language training model such as GPT-3 across 64 NVIDIA A100 GPUs and beyond can be complex and difficult to tune for high performance. This BoF will bring industry experts and users together to discuss the current trends, challenges and approaches to effectively scale these trainings across large clusters of GPUs in the cloud. The goal is to foster discussions amongst the HPC and AI communities to share best practices, and for the platforms that host these workloads to understand how best to support those running these applications.

Long Description: Generative Pre-trained Transformer (GPT) training is the usage of language models that have significantly transformed the natural language processing (NLP) community by using deep learning to generate realistic human texts. GPT models can be used in a wide range of tasks to create anything that has language structure, which includes question answering, language translation, and code creation. Most recently, in June 2020, the third iteration of the GPT model, GPT-3, made its initial release and has been proven to be so powerful that even entire startups have been created with it. However, such models have many challenges to be discussed and understood. GPT-3 is an extremely large model, with over 175 billion machine learning parameters. To put it in context, the largest trained language model before GPT-3 had 10 billion parameters. Scaling a large language training model such as GPT-3 across 64 NVIDIA A100 GPUs and beyond can be very complex and certainly difficult to tune for high performance.

This BOF will bring together industry and research experts from both the HPC and NLP communities to discuss the current trends, challenges, and approaches to effectively scale these trainings across large clusters of GPUs on-prem and on the cloud. One of the goals of this BOF is to foster and encourage an open discussion amongst the HPC and NLP communities, so they can share best practices. In addition, providers of HPC platforms that host these workloads will be able to understand how best to support those running these applications.

Large AI language models like GPT-3 have the potential to penetrate the workforce through use cases, such as chatbots, copywriting, coding assistance, and more. While GPT-3 is remarkably powerful, it has several limitations and risks associated with its usage. So, another of this BoF’s goals is to debate both the benefits and limitations of current GPT models and to have a discussion on what the future can hold for GPT training.

This BOF has not been held before. The expected audience would be HPC experts from academia, research labs, and industry, with experience and interest in extreme scale problems, NLP researchers, developers, and end users interested in learning HPC best practices to address large scale problems with high-performance, so they can understand how to best scale their infrastructure to meet the needs of these large models, and anyone interested in deep learning and natural language processors and the interplay of DL and NLP with HPC.

With such massive scaling and performance problem, as an outcome of this BOF we expect for the NLP community to learn from the HPC community on their experience addressing issues of scale and performance in many scientific domains. Similarly, we expect the HPC community to benefit from exposure to a new and active area of development that requires high performance at extreme scale with very motivating challenges.


URL:


Back to Birds of a Feather Archive Listing