BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T055346Z
LOCATION:Second Floor Atrium
DTSTART;TZID=America/Chicago:20211118T083000
DTEND;TZID=America/Chicago:20211118T170000
UID:submissions.supercomputing.org_SC21_sess257_drs102@linklings.com
SUMMARY:Performance Profiling, Analysis, and Optimization of GPU-Accelerat
 ed Applications
DESCRIPTION:Doctoral Showcase, Posters\n\nPerformance Profiling, Analysis,
  and Optimization of GPU-Accelerated Applications\n\nZhou, Mellor-Crummey\
 n\nGPUs have emerged as a key component for accelerating applications in v
 arious domains, including deep learning, data analytics, and scientific si
 mulations. While GPUs provide superior compute power and higher memory ban
 dwidth than CPUs, writing efficient GPU code to achieve maximum possible p
 erformance is challenging because of the sophisticated programming models 
 and architectural features. Performance tools for GPUs are designed to pin
 point performance bottlenecks in GPU-accelerated applications and provide 
 performance insights for users. Existing performance tools, however, are i
 nsufficient to identify hotspots and provide insights for complex applicat
 ions. To address these challenges, we developed a collection of GPU perfor
 mance tools to measure, analyze, and optimize GPU-accelerated applications
 . Our GPU profiler employs instruction sampling and instrumentation to col
 lect a wide range of GPU metrics and adopts novel wait-free data structure
 s to coordinate performance monitoring and attribution with low overhead. 
 Our analysis tool constructs sophisticated GPU calling context to help use
 rs pinpoint hot GPU code. To understand inefficiencies in hotspots, the an
 alysis tool identifies problematic value patterns on accessed memory addre
 sses. Further, the analysis tool interprets performance metrics and analyz
 es bottlenecks by attributing measured instruction stalls to their root ca
 uses and matching inefficient code with optimization suggestions. To demon
 strate the effectiveness of our tools, we studied many deep learning and H
 PC applications. Guided by the insightful performance reports generated by
  our tools, we identified performance hotspots, confirmed the issues with 
 application developers, and proposed optimizations. In this proposal, we s
 ummarize the work we have done so far and describe plans for future work.\
 n\nTag: In-Person Only\n\nRegistration Category: Tech Program Reg Pass, Ex
 hibit Hall Only
END:VEVENT
END:VCALENDAR
