SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Solving the Fabric Management Gordian Knot


Authors: Michael Aguilar (Sandia National Laboratories), Russell Herrell (Hewlett Packard Enterprise), Richelle Ahlvers (Intel Corporation), Jim Hull (Intelliprop Inc), Phil Cayton (Intel Corporation), Erich Hanke (Intelliprop Inc)

Abstract: Modern HPC and Enterprise computing systems can benefit from a more efficient way to assemble and control network fabrics. The OpenFabrics Alliance (OFA) together with its partners the DMTF, SNIA and the Gen-Z Consortium, are developing a new open-source fabric management framework to provide a unified set of tools to control and monitor multiple network fabric types. This BoF seeks input from the HPC network, storage and security communities to provide useful tools to meet their needs, as well as open participation in the development of the OpenFabrics Management Framework.

Long Description: Increasingly complex computing problems being tackled today are creating diverse requirements for an array of fabric management tools and applications needed to operate more architecturally complex computing systems. Developers of such tools and management applications, in turn, are faced with a complex permutation of fabrics (InfiniBand, Gen-Z, Slingshot, others).

Disaggregated resources, such as memory, storage, compute, and accelerators, are interconnected by high speed fabrics. With no common way of querying or manipulating such fabrics and resources a Gordian Knot of fabrics and resource allocation is being created. The victims of this Gordian Knot conundrum are System Administrators, Application Designers, and System Architects who design, deploy, maintain, and use any sort of fabric-based computing system and whom must supply their users with reliable, high performance systems. This includes systems for High-Performance Computing, Machine Learning, Cloud-based systems, and Enterprise environments.

The OpenFabrics Alliance (OFA), together with its partners, the DMTF, SNIA, and the Gen-Z Consortium are launching an effort to design and develop an open fabric management framework designed to help slice through this Gordian Knot. This open fabric management framework consists of a set of common tools designed for managing and manipulating underlying fabrics in an abstract way. Through the use of the open management framework tools, client APIs and methods can create resource/client associations, sub-fabrics and aggregate super-fabrics, get performance information, and manipulate underlying fabrics.

The resulting framework is intended to be used by clients to deliver security services, switch and end point inventories, route management, telemetry, performance and diagnostics, and more. Currently, targeted clients include Workload Managers, MPI and SHMEM, distributed deployment services, and others.

This collaboration is intending to create an open fabric management framework that will utilize Redfish as a management tool standard for modeling these complex fabrics. Currently, there is a high-level fabric representation in Redfish and details for some fabric types. The collaboration is focused on completing the detailed representation across multiple fabric types.

This BoF is targeting communities of: Developers of fabrics, compute, accelerators, storage, and memory Developers of fabric management solutions and tools such as automation, composition, and orchestration Those developing solutions that rely on accurate, easy access to fabric information such as workload managers, task brokers, telemetry services, operations management, AI, Big Data analytics, and performance tuning applications

This BoF is a Call to Action for those communities to discuss fabric management Use Cases, provide feedback on the most urgent set of problems facing them and to collect initial requirements. We are wanting to raise awareness of the OpenFabrics OpenFabrics Management Framework and solicit members for participation in development.


URL: https://www.openfabrics.org/openfabrics-management-framework/


Back to Birds of a Feather Archive Listing