SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Composable Heterogeneous Computing Environments: DIY or Full Automation?


Authors: Martijn de Vries (Bright Computing Inc), Matt Demas (GigaIO), Frank Würthwein (University of California, San Diego)

Abstract: In this BoF, Bright Computing, GigaIO and the San Diego Supercomputer Center will introduce the concept of composable heterogeneous computing environments, laying out the current challenges and offering solutions from DIY to fully automated solutions. We will foster a technical discussion with the audience around the challenge and promise of composable infrastructures for HPC/AI, and then discuss different approaches for realizing easy-to-use composable systems. We will engage BoF attendees to discuss their experiences with other approaches to composable systems, the challenges they have faced, and solutions they have utilized to execute on composable heterogeneous systems.

Long Description: In this Birds of a Feather (BoF), Bright Computing (Bright), a global leader in automation and management software for edge-to-core-to-cloud high-performance computing and GigaIO, the creators of next-generation data center rack-scale architecture for artificial intelligence (AI) and high-performance computing (HPC) solutions will introduce the concept of composable heterogeneous computing environments, laying out the current challenges and offering solutions from DIY to fully automated solutions. In addition, the San Diego Supercomputer Center (SDSC) will share their experience with distributed computing architectures and the potential benefits of a composable, heterogeneous systems. We will foster a technical discussion with the audience around the challenge and promise of composable infrastructures for HPC, and then discuss different approaches for realizing easy-to-use composable systems. This topic is highly relevant to the HPC industry as it stands to increase resource utilization, leverage heterogeneity, enable resource sharing, boost system performance, reduce time to results, and drive down OpEx and CapEx expenditures.

The Challenges: The need to leverage new types of processors and accelerators, new AI algorithms, on servers from different manufacturers, integrate with the cloud, extend to the edge, and host machine learning and data analytics applications has placed a demand on organizations to find a more flexible approach to the delivery of HPC resources. A paradigm shift of this magnitude reveals several challenges to automating a composable heterogeneous computing solution. Because composable nodes do not exist as an inventory of physical nodes, some workload management systems will not even allow jobs that require them to be submitted, while others will allow them to be submitted, but they will never actually run. Another challenge is how to prevent the workload manager from starving the composable nodes to the extent that new nodes cannot be composed.

The Solution: We will explore approaches ranging from DIY options to a fully automated approaches, such as Bright Computing’s new 9.2 integration with GigaIO's FabreXTM technology - cluster administrators can compose nodes manually using the Cluster Manager Shell (CMSH), or they can be composed on-the-fly by Bright’s Auto Scaler.

By breaking the barrier of the server box, technologies such as GigaIO’s FabreX enable the entire rack to be treated as the compute unit. All resources normally located inside the server (GPUs, FPGAs, storage) can now be pooled in accelerator or storage appliances where they are available to all the servers in the rack. Workload management systems such as Bright Cluster Manager enables users to quickly build and manage heterogeneous high-performance Linux clusters that host HPC, machine learning, and analytics applications that span from edge-to-core-to-cloud.

Research computing organizations have a choice ranging from workload managers to commercially integrated software-hardware solutions like the combination of Bright Computing and GigaIO. We will explore the pros and cons of these options throughout the session.


URL:


Back to Birds of a Feather Archive Listing