Second International Symposium on Checkpointing for Supercomputing
Event Type
Workshop
Fault Tolerance
MPI
Reliability and Resiliency
Checkpoint/restart
W
TimeMonday, 15 November 20219am - 9:10am CST
Location227
DescriptionAs a primary approach to fault-tolerant computing, Checkpoint/Restart (C/R) is essential to a wide range of HPC communities. While there has been much C/R research and tools development, continued C/R research is indispensable to keep pace with ever-changing HPC architectures, technologies and workloads. More effort is also needed to narrow the gap between proof-of-concept C/R research codes and production-quality codes capable of deployment in real-world workloads. In this workshop, we will bring together C/R researchers and tools developers, practitioners, application developers and end users to focus on C/R research and successes in production use, motivating the development of usable C/R tools, the closing of the gap between state-of-the-art research and production, and the harnessing of the full benefits of C/R for the HPC community. Paper submissions will be peer-reviewed, and a venue for accepted papers will be identified. We especially encourage PhD students and HPC end users to participate.
Links