Science Capsule: Toward Sharing and Reproducibility of Scientific Workflows
Cloud and Distributed Computing
TimeMonday, 15 November 20215pm - 5:25pm CST
DescriptionWorkflows are increasingly processing large volumes of data from scientific instruments, experiments and sensors. These workflows often consist of complex data processing and analysis steps that might involve human in the loop, and use a diverse set of analysis tools. Sharing and reproducing these workflows with collaborators and the larger community is critical but hard to do without the entire context of the workflow including user notes and execution environment. In this paper, we introduce Science Capsule that automatically captures and processes events associated with the execution and data life cycle of workflows, and provides ways to enhance the information with user artifacts. It also allows users to create 'workflow snapshots' that keep track of the different versions of a workflow and their lineage, allowing scientists to incrementally share and extend workflows between users. Our results show that Science Capsule is capable of processing and organizing events in near real-time for high-throughput experimental and analysis workflows without incurring any significant performance overheads.