BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T055404Z
LOCATION:Online
DTSTART;TZID=America/Chicago:20211114T080000
DTEND;TZID=America/Chicago:20211114T170000
UID:submissions.supercomputing.org_SC21_sess186_tut104@linklings.com
SUMMARY:Fault-Tolerance for High Performance and Big Data Applications: Th
 eory and Practice
DESCRIPTION:Tutorial\n\nFault-Tolerance for High Performance and Big Data 
 Applications: Theory and Practice\n\nBosilca, Bouteiller, Herault, Robert\
 n\nResilience is a critical issue for large-scale platforms, and this tuto
 rial provides a comprehensive survey of fault-tolerant techniques for high
 -performance and big data applications, with a fair balance between theory
  and practice. This tutorial is organized along four main topics: an overv
 iew of failure types (software/hardware, transient/fail-stop) and typical 
 probability distributions (Exponential, Weibull, Log-Normal); general-purp
 ose techniques, which include several checkpoint and rollback recovery pro
 tocols, replication, prediction and silent error detection; application-sp
 ecific techniques, such as user-level in-memory checkpointing, data replic
 ation (map-reduce) or fixed-point convergence for iterative applications (
 back-propagation); and practical deployment of fault-tolerance techniques 
 with User Level Fault Mitigation (a proposed MPI standard extension).\n\nT
 he tutorial is open to all SC21 attendees who are interested in the curren
 t status and expected promise of fault-tolerant approaches for scientific 
 and big data applications. Basic knowledge of MPI will be helpful for the 
 hands-on session.\n\nTag: Online Only, Reliability and Resiliency\n\nRegis
 tration Category: Tutorial Reg Pass
END:VEVENT
END:VCALENDAR
