SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Toward Access Pattern Aware Checkpointing for Kokkos Applications


Student: Nigel Tan (University of Tennessee, Knoxville)
Supervisor: Michela Taufer (University of Tennessee, Knoxville)

Abstract: The common checkpoint philosophy, checkpoint everything as frequently as possible, is becoming ineffective as we progress towards exascale machines, facing shrinking time between failures. This makes portability and resilience vital for the future of HPC. This poster demonstrates the need and forms the foundation for enhancing checkpointing to take advantage of application properties. Specifically, we show how access pattern aware checkpointing improves performance using incremental checkpoints of sparsely updated data as an example. We also define how the portable checkpointing abstractions in Kokkos Resilience can be modified to support such an enhancement transparently.

ACM-SRC Semi-Finalist: no

Poster: PDF
Poster Summary: PDF


Back to Poster Archive Listing