SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Accelerating Storage IO to GPUs

Authors: CJ Newburn (NVIDIA Corporation), Glenn Lockwood (Lawrence Berkeley National Laboratory (LBNL)), Sven Oehme (DataDirect Networks (DDN))

Abstract: Whether you’re exploring petabytes of geological data, training massive neural networks, or modeling high-speed financial trading, your GPU-dependent applications are burdened by slow I/O when loading datasets. Historically, CPUs control GPU I/O. As computation shifts from slower CPUs to faster GPUs, the need to bypass CPUs and accelerate GPU-storage traffic to relieve these performance bottlenecks increases. This session will provide a brief overview of key trends, available solutions,and illustrative application performance gains in this space. The majority of the session will engage in an open, forward-looking discussion with the gathered community on promising areas for investigation.

Long Description: Acceleration of storage into GPUs is fundamentally a communal effort, since it involves technology vendors, ecosystem partners, middleware developers, and application and framework developers, and end users. The BoF organizers and presenters span industry, government labs, and academia, to represent this breadth of interests.

There is significant interest in relieving storage-related bottlenecks in the context of GPUs in the community. Data set sizes are exploding, and don’t fit in local memory anymore. New technologies usher astonishing new opportunities where it’s faster to load data from local or remote storage into GPUs than from local CPU memory. An increasing fraction of workflows have been ported to and accelerated by GPUs, such that the data path from storage leads directly into GPUs instead of making a stop in the CPU. The data path tends to make use of a direct, RDMA-enabled path wherever possible, and there are new possibilities for using RDMA in the context of object storage like S3 as well.

The expertise that the HPC community brings to bear in tying together compute, storage, and networking is being applied to the infrastructure for an increasing number of data centers that employ AI. The number of such data centers is exploding [ Industrial Data Center Sales Grew Faster Than Hyperscalers CRN February 24, 2021] and newly-procured or redesigned data centers are looking for the best-available technologies that tie together accelerated computing, storage, and networking in application domains that span deep learning, data analytics, visual processing, and more.

This is the first BoF at SC on this topic, but previous individual presentations by the organizers have had over 1000 views. All of the major storage vendors are engaging in this topic, opening access to a large array of customers in a large ecosystem. That large reach has already stoked considerable excitement in this topic, and several vendors and end users are ready to jump in with examples of support and use cases. This extensive network creates opportunities for a broad advertising reach in the HPC and storage communities using our respective contacts.


Back to Birds of a Feather Archive Listing