BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T055352Z
LOCATION:Second Floor Atrium
DTSTART;TZID=America/Chicago:20211117T083000
DTEND;TZID=America/Chicago:20211117T170000
UID:submissions.supercomputing.org_SC21_sess255_drs103@linklings.com
SUMMARY:Holistic Performance Analysis and Optimization of Unified Virtual 
 Memory
DESCRIPTION:Doctoral Showcase, Posters\n\nHolistic Performance Analysis an
 d Optimization of Unified Virtual Memory\n\nAllen, Ge\n\nHigh-performance 
 computing systems have seen tremendous growth in theoretical performance w
 ith the inclusion of Graphics Processing Units (GPUs) and other accelerato
 rs. The difficulty of programming these systems has grown alongside the pe
 rformance as programmers are required to manage separate programming model
 s for accelerators alongside traditional CPU resources. This complexity wa
 s mitigated with the introduction of heterogeneous shared memory systems, 
 such as NVIDIA Unified Virtual Memory (UVM) and Heterogeneous Memory Manag
 ement (HMM), to replace the requirement of programmers to directly manage 
 memory. These technologies reduce programming complexity by managing the p
 hysical location of memory on behalf of the programmer. However, there is 
 a significant performance gap between heterogeneous shared memory and dire
 ctly-managed memory in modern systems, significantly reducing effective ap
 plication performance and the efficiency of the underlying HPC systems. Wh
 ile prior work has taken steps to understand the high-level performance of
  these systems, the underlying performance issues are not well understood 
 and therefore challenging to resolve. In this work, we introduce our metho
 dology for deep investigation of UVM performance at the systems-software a
 nd workload-generation level. This work will identify the fundamental cost
 s within UVM, including the fundamental costs of UVM systems software as w
 ell as the sources of overhead that can be eliminated. Based on this initi
 al performance study, we also investigate optimizations to UVM to increase
  overall system performance and improve HPC application performance automa
 tically. Further, we seek to generalize these improvements and insights su
 ch that they can be applied to other systems software and devices, such as
  HMM.\n\nTag: In-Person Only\n\nRegistration Category: Tech Program Reg Pa
 ss, Exhibit Hall Only
END:VEVENT
END:VCALENDAR