SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

UCX: Unified Communication X Community


Authors: Gilad Shainer (NVIDIA Corporation), Jeff Kuehn (Los Alamos National Laboratory), Pavel Shamis (ARM Ltd), Dhabaleswar Panda (Ohio State University), Brad Benton (Advanced Micro Devices (AMD) Inc), Yanfei Guo (Argonne National Laboratory (ANL)), Steve Poole (Los Alamos National Laboratory)

Abstract: In order to exploit the capabilities of new HPC systems and to meet their scalability requirements, communication software needs to scale on millions of cores and support applications with adequate functionality. UCX is a collaboration among industry, national labs and academia that consolidates that provides a unified open-source framework.

The UCX project is managed by the UCF consortium (http://www.ucfconsortium.org/) and includes members from LANL, ANL, Ohio State University, AMD, ARM, IBM, NVIDIA and more. The session will serve as the UCX community meeting, and will introduce the latest development to HPC developers and the broader user community.


Long Description: In order to exploit the capabilities of new HPC systems and to meet their demands in scalability, communication software needs to scale on millions of cores and support applications with adequate functionality to express their parallelism. UCX is a collaboration between industry, national labs and academia that consolidates multiple technologies that provides a unified open source framework. The UCX project is managed by the UCF consortium (http://www.ucfconsortium.org/) and includes members from LANL, ANL, Ohio State University, AMD, ARM, IBM, NVIDIA and more. Other entities supporting the open source development include ORNL, Huawei, Atos and others. The session will serves as the UCX community meeting, and will introduce the latest development and specification to HPC developers and the broader user community.

Modern HPC systems include extreme numbers of compute elements and extremely low-latency interconnection networks. In order to exploit the capabilities of these architectures and to meet their demands in scalability, communication software needs to scale and support applications with adequate functionality to express their parallelism. Moreover, communication software should add as little overhead as possible in order to avoid compromising the native performance of the interconnection network. These requirements make the design of high-performance communication software extremely intricate, since they demand minimal memory requirements and low instruction counts and cache activity while meeting stringent performance targets.

High-level programming models for communication (e.g., MPI, SHMEM) can be built on top of middleware, such as Portals, GASNet, UCCS, and ARMCI or use lower-level network-specific interfaces, often provided by the vendor. While the former offer high-level communication abstractions and portability across different systems, the latter offer proximity to the hardware and minimize overheads related to multiple software layers. An effort to combine the advantages of both is UCX, a communication framework for high-performance computing systems.

Due to its importance to the future of HPC technologies and applications, UCX has received the 2019 R&D100 award.

The UCF organization manages other open source projects, including UCC (Unified Collective Communication), OpenSNAPI (Open Smart NIC API) and others. The session will include include a brief overview on these projects and call for participation.


URL:


Back to Birds of a Feather Archive Listing