Workshop:11th International Workshop on Runtime and Operating Systems for Supercomputers ROSS
Authors: Brian Barrett (Amazon Web Services)
Abstract: HPC has long been the realm of the Supercomputer, specialized machines dedicated to large scale MPI applications. Building these machines required carefully balancing network, processor, and memory technologies and chasing every system inefficiency in order to achieve peak performance. Over time, many of the components have become commoditized (memory, processor, and now even networking components are largely commodity options). This has opened the door to using Cloud computing infrastructure for HPC applications. Like current supercomputers, HPC in the Cloud requires intelligent system software to allow application developers to manage the complexity of the system (while still leaving time to get some real work done). In this talk, we'll present some of the challenges we have faced in trying to run HPC applications in a large-scale Cloud environment, some of the challenges we unexpectedly did not face, and some of the solutions we have assembled for building successful HPC environments. Finally, we will discuss areas of research that we believe are critical to making HPC in the Cloud more than just another Supercomputer.