BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T055403Z
LOCATION:Second Floor Atrium
DTSTART;TZID=America/Chicago:20211116T083000
DTEND;TZID=America/Chicago:20211116T170000
UID:submissions.supercomputing.org_SC21_sess278_rpost136@linklings.com
SUMMARY:Analyzing Complex Memory Systems
DESCRIPTION:Posters, Research Posters\n\nAnalyzing Complex Memory Systems\
 n\nButcher\n\nSeveral recent systems in the Top500 include many-core chips
  with complex memory systems, including multiple memory channels. Many man
 y-core chips feature an intermediate layer of memory with higher bandwidth
  and lower capacity then main memory. Intermediate memory exists either in
  a cache or a separate address space. \n\nThis paper uses Intel's Knights 
 Landing (KNL) processor as a testbed, it includes both intermediate memory
  and multiple architectural knobs to adjust affinity. We present cache-obl
 ivious and chunking algorithms for sort, matrix-multiply and Fast Fourier 
 Transforms (FFT), and compare to state of the art codes. Experimenting wit
 h a wide range of problem types and algorithmic solutions gives insight in
 to how affinity can affect performance.  Chunking often achieves low utili
 zation of the memory system as the cost of adding threads to move data out
 weighs the benefit of improved bandwidth. The results achieved with straig
 htforward cache-oblivious codes are competitive with state-of-the-art code
 s.\n\nRegistration Category: Tech Program Reg Pass, Exhibit Hall Only
END:VEVENT
END:VCALENDAR
