Workshop:WORKS21: 16th Workshop on Workflows in Support of Large-Scale Science
Authors: Benjamin T. Shealy (Clemson University, Electrical and Computer Engineering) and F. Alex Feltus and Melissa C. Smith (Clemson University)
Abstract: Scientific workflows and high-performance computing (HPC) systems are critically important to modern scientific research. In order to perform scientific experiments at scale, domain scientists must have knowledge and expertise in software and hardware systems that are highly complex and rapidly evolving. While computational expertise will be essential for domain scientists going forward, any tools or practices that reduce this burden for domain scientists will greatly increase the rate of scientific discoveries. One such example is knowing ahead of time the resource usage patterns of an application for the purpose of resource provisioning. A tool that accurately estimates these resource requirements would benefit HPC users in many ways, by reducing job failures and queue times on traditional HPC systems and reducing costs on cloud computing systems. In this work we present Tesseract, a semi-automated tool that predicts resource usage for any application on any computing platform, from historical data, with minimal input from the user. We employ Tesseract to predict runtime, memory usage, and disk usage for a diverse set of scientific workflows, and in particular we show how these resource estimates can prevent under-provisioning.
Back to WORKS21: 16th Workshop on Workflows in Support of Large-Scale Science Archive Listing