Multi-Tenancy Management and Zero Downtime Upgrades Using Cray-HPE Shasta Supercomputers
TimeFriday, 19 November 202110:50am - 11:10am CST
DescriptionAs supercomputing systems gradually become an integral part of data driven workflows such as ML/AI or tightly-coupled pre/post-processing pipelines, users need programmable access to shared resources to avoid moving large volume of data to dedicated systems or to clouds. Public clouds or the private ones using technologies like OpenStack, multi-tenancy on shared hardware is a commonplace, offering users programmable and privileged access to compute, network and storage resources. Such access is unavailable to users on batch-scheduled, multi-Petascale supercomputing systems, which are designed for achieving close-to-metal performance for scientific applications at unprecedented scales. In this paper, we focus on multi-tenancy within hardware and software stacks of Cray-HPE EX Shasta supercomputing systems for creating high performance and cloud virtual-clusters for HPC and AI/ML workloads respectively. Using orchestration examples for zero-downtime upgrades of virtual-clusters, we demonstrate benefits of multi-tenancy for achieving close-to-metal performance, alongside resource elasticity and customization, without interruption to operational services.
