Workshop: SuperCheck-SC21: Second International Symposium on Checkpointing for Supercomputing
Parallel Programming Languages and Models
Reliability and Resiliency
TimeMonday, 15 November 20219am - 5:30pm CST
DescriptionAs a primary approach to fault-tolerant computing, Checkpoint/Restart (C/R) is essential to a wide range of high performance computing (HPC) communities. While there has been much C/R research and tools development, continued C/R research is indispensable to keep pace with ever-changing HPC architectures, technologies, and workloads. More effort is also needed to narrow the gap between proof-of-concept C/R research codes and production-quality codes capable of deployment in real-world workloads. In this workshop, we will bring together C/R researchers and tools developers, practitioners, application developers, and end users to focus on C/R research and successes in production use, motivating the development of usable C/R tools, the closing of the gap between state-of-the-art research and production, and the harnessing of the full benefits of C/R for the HPC community.