Workshop:CANOPIE-HPC: Containers and New Orchestration Paradigms for Isolated Environments in HPC
Authors: Yan-Tyng (Sherry) Chang, Steve Heistand, Robert Hood, and Henry Jin (NASA Ames Research Center)
Abstract: This work investigates the feasibility of a Singularity solution to support running MPI applications in “hybrid” MPI mode on NASA’s HECC resources. Two types of applications were tested: HPC and AI/ML. On the HPC side, two JEDI containers built with Intel MPI for Earth science modeling were tested on both HECC in-house and HECC AWS Cloud CPU resources. On the AI/ML side, a NVIDIA TensorFlow container built with OpenMPI was tested with a NCF recommender system and the ResNet-50 computer image system on the HECC in-house V100 GPUs. Our exercises demonstrate that although porting containers to run with a single node using just the container MPI is quite straightforward, running across multiple nodes in hybrid MPI mode requires knowledge of Singularity, MPI libraries, the operating system image, and the communication infrastructure such as the transport and network layers.