BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T055352Z
LOCATION:Second Floor Atrium
DTSTART;TZID=America/Chicago:20211117T083000
DTEND;TZID=America/Chicago:20211117T170000
UID:submissions.supercomputing.org_SC21_sess279@linklings.com
SUMMARY:Research Posters Display
DESCRIPTION:Posters, Research Posters\n\nParallel Framework for Updating L
 arge-Scale Dynamic Networks\n\nSrinivasan, Pandey, Khanda, Srinivasan, Das
 \n\nAnalysis of large-scale dynamic networks is vital for understanding th
 e relationship between entities that constantly change over time. Unfortun
 ately, existing algorithms for identifying graph properties are optimized 
 for static networks and resort to recomputing those properties over the en
 tire net...\n\n---------------------\nFeature Reduction of Darshan Counter
 s Using Evolutionary Algorithms\n\nRajesh, Koziol, Byna, Tang, Bez...\n\nF
 eature reduction is an integral part of data preparation in machine learni
 ng. It helps denoise the data and makes it easier to fit the model. Predic
 ting the performance of an application using Darshan counters can be trick
 y due to the large amount of data available, with not all of them being pe
 rti...\n\n---------------------\nGASNet-EX Memory Kinds: Support for Devic
 e Memory in PGAS Programming Models\n\nHargrove, Bonachea, MacLean, Waters
 \n\nThere is an emerging need for adaptive, lightweight communication in i
 rregular HPC applications at exascale, where GPU accelerators provide the 
 majority of available compute cycles.  To address this need, Lawrence Berk
 eley National Lab is developing a programming system to support distribute
 d-memory...\n\n---------------------\nHandling C++ Exceptions in MPI Appli
 cations\n\nJaros\n\nHandling error states in C++ applications is managed b
 y exceptions. In distributed applications, it is necessary to inform the o
 ther processes, that something wrong happened, and the application should 
 either recover from the faulty state, or report the error and terminate gr
 acefully. Unfortunately,...\n\n---------------------\nFargraph: Optimizing
  Graph Workload on RDMA-Based Far Memory Architecture\n\nWang, Li, Wang, Z
 hang, Wang...\n\nDisaggregated architecture brings new opportunities to me
 mory-consuming applications like graph analytics. It allows one to outspre
 ad memory access pressure from local to far memory, providing an attractiv
 e alternative to disk-based processing. In this paper, we take the first s
 tep to analyze the im...\n\n---------------------\nAnalyzing Complex Memor
 y Systems\n\nButcher\n\nSeveral recent systems in the Top500 include many-
 core chips with complex memory systems, including multiple memory channels
 . Many many-core chips feature an intermediate layer of memory with higher
  bandwidth and lower capacity then main memory. Intermediate memory exists
  either in a cache or a sepa...\n\n---------------------\nDetecting Networ
 k Intrusion Anomalies through Egonet-Based Data Mining with Apache Spark\n
 \nPaik, Kwak, Lu\n\nNetwork intrusions often contain dangerous breaches to
  network security systems and their data. We design an anomaly detection s
 ystem to identify network intrusions. Our proposed detection method is ins
 pired by the use of egonets in the oddball algorithm but differs by the ex
 tracted features and the...\n\n---------------------\nHyperQueue: Overcomi
 ng Limitations of HPC Job Managers\n\nBöhm, Beránek, Cima, Machá&#269;ek, 
 Jha...\n\nIn recent years, HPC workloads and communities have undergone su
 bstantial paradigm shifts. There is an increasing amount of users that wan
 t to leverage HPC clusters to execute many simple and embarrassingly paral
 lel tasks as easily as possible. Due to the limitations of traditional HPC
  job managers,...\n\n---------------------\nAn Interactive GPU Metric Dash
 board for HPC clusters\n\nShi, Cook, Chatterjee, Blaschke\n\nScaling up pr
 ograms to run at the scale of a modern high-performance computing (HPC) ce
 nter can be a daunting task. One of the first questions developers ask is:
  “Is my program using all the hardware available?” Many tools can extract 
 detailed performance data on applications. But the level of detai...\n\n--
 -------------------\nMultiple Same Level and Telescoping Nesting in GFDL’s
  FV3\n\nMouallem, Benson, Harris\n\nThe current two-way, single nest capab
 ility in the FV3 dynamical core, used in weather and climate applications 
 by a diverse group of institutions and organizations, is upgraded with the
  capability to employ multiple same-level nests as well as telescope, or e
 mbed, the various nests within each othe...\n\n---------------------\nHete
 rogeneous Computing for Undergraduates: A Module-Driven Approach\n\nQasem,
  Bunde, Schielke\n\nTo achieve aggressive performance and power goals, HPC
  systems are becoming increasingly heterogeneous.  This represents an educ
 ational challenge since few current curricula include much about heterogen
 eous computing except possibly in upper-division electives.  This poster p
 resents a set of modules...\n\n---------------------\nTowards a Scalable a
 nd Distributed High-Performance SHAD C++ library\n\nWu, Castellana, Kaiser
 \n\nSHAD is the Scalable High-performance Algorithms and Data-structures C
 ++ library, providing general purpose building blocks and supporting high-
 level custom utilities. SHAD is designed with scalability, flexibility, pr
 oductivity and portability in mind, and serves as a playground for researc
 h in par...\n\n---------------------\nChaining Multiple Tools and Librarie
 s Using Gotcha\n\nXu, Mohror, Devarajan, Stanavige, Bhatele\n\nIn HPC, it 
 is common to use tools to understand the performance of applications. User
 s often want to run applications linked with multiple performance tools to
  complement each other, as tools may record different information. Many to
 ols operate by intercepting application function calls with "wrappe...\n\n
 ---------------------\nLearning-Based Content Delivery in 5G-Enabled Multi
 -Access Edge Computing\n\nFarhangi Maleki, Ma, Mashayekhy, La Roche\n\nThe
  demand for content such as multimedia services with high performance (e.g
 ., ultra-low latency) requirements has increased significantly, posing hea
 vy backhaul congestion in mobile networks. The integration of multi-access
  edge computing (MEC) and 5G network is an emerging solution that alleviat
 e...\n\n---------------------\nAccurate Throughput Prediction of Basic Blo
 cks on Recent Intel Microarchitectures\n\nAbel, Reineke\n\nTools to predic
 t the throughput of basic blocks on a specific microarchitecture are usefu
 l to optimize software performance and to build optimizing compilers. In r
 ecent work, several such tools have been proposed. The accuracy of their p
 redictions, however, has been shown to be relatively low. To a ...\n\n----
 -----------------\nRIKEN CGRA: Data-Driven Architecture as an Extension of
  Multicore CPU for Future HPC\n\nAdhi, Kojima, Tan, Podobas, Sano\n\nA coa
 rse-grained reconfigurable array (CGRA) is currently gaining momentum to e
 nter the HPC market as deep-learning accelerators due to its potential for
  performance scalability and energy efficiency. Traditionally, the usage o
 f CGRA was limited in embedded systems to provide extra computing perform.
 ..\n\n---------------------\nSODA-OPT: System-Level Design in MLIR for HLS
 \n\nBohm Agostini, Kaeli, Tumeo\n\nHigh-level-synthesis (HLS) enables the 
 generation of hardware descriptions from applications implemented with hig
 h-level languages.  State-of-the-art tools, however, typically require the
  application to be manually translated to C/C++ and carefully annotated to
  improve final design performance. This...\n\n---------------------\nSuppo
 rt in OpenMP for Multi-GPU Parallelism\n\nTorres, Kale, Malik, Scogland, F
 errer...\n\nNodes of emerging supercomputers have multiple GPUs, i.e., a m
 ulti-GPU, on them. Applications are often parallelized across the GPUs of 
 a multi-GPU using MPI, but a more performant and portable solution for par
 allelizing across the GPUs is needed. OpenMP, which is used to parallelize
  computation wit...\n\n---------------------\nPerformance Analysis of Cont
 ainerized OrangeFS in HPC Environment\n\nYildirim, Tang, Kougkas, Sun\n\nC
 ontainerization is on the rise in cloud computing due to its benefits in r
 eproducibility, portability and lightweight properties. The need for scala
 ble and reproducible products and the necessity to interoperate on everyth
 ing from local machines to large-scale HPC resources have made HPC solutio
 ns ...\n\n---------------------\nBreadth-First Search on Xilinx Versal\n\n
 Prado Alves, Minutoli, Belviranli, Tumeo\n\nThe new Xilinx Versal Platform
  provides a highly heterogeneous system to programmers. How these diverse 
 resources can be utilized effectively is an open question. This project im
 plements breadth-first search (BFS) on this platform, utilizing all availa
 ble regions to accelerate this workload. This is...\n\n-------------------
 --\nTowards an Efficient Parallel Skeleton for Generic Iterative Stencil C
 omputations in Distributed GPUs\n\nde Castro, Santamaria-Valenzuela, Migue
 l-Lo&#769;pez, Torres, Gonzalez-Escribano\n\nIterative stencil application
 s present a high degree of parallelism. The approach is appropriate for mo
 dern many-core systems, like GPUs. The huge arrays needed in many real pro
 blems, however, require multiple-GPUs memory. Manually programming multi-G
 PU solutions is complex, involving synchronizatio...\n\n------------------
 ---\nFlexible GMRES with Analog Accelerators\n\nGupta, Kalantzis, Squillan
 te, Wu, Avron...\n\nThis research poster describes advances in the solutio
 n of general sparse linear systems by applying the preconditioning step pr
 imarily through the help of an analog crossbar array. These architectures 
 can achieve high degrees of parallelism with low energy consumption by map
 ping matrices onto array...\n\n---------------------\nEnabling Combustion 
 Science Simulations for Future Exascale Machines\n\nRood, Henry de Frahan,
  Day, Sitaraman, Yellapantula...\n\nReacting flow simulations for combusti
 on applications require extensive computing capabilities. Leveraging the A
 MReX library, the Pele suite of combustion simulation tools targets the la
 rgest supercomputers available and future exascale machines. We introduce 
 PeleC, the compressible solver in the Pe...\n\n---------------------\nCapt
 uring Relationships Based on Structure Similarity for Self-Describing Scie
 ntific Data Formats\n\nNiu, Zhang, Byna, Chen\n\nMany scientific data sets
  use self-describing files for storing data, and files within a data set a
 re often isolated without any definition of relationships among them. Beca
 use of the isolated management of scientific data files, locating, assimil
 ating and utilizing relationships for a given query r...\n\n--------------
 -------\nCore-Idling on MPI Intra-Node Communication Channels for Energy E
 fficiency\n\nKim, Jin, Byun\n\nBusy-waiting used to implement parallel pro
 gramming models can provide low latency but at the expense of energy. Ther
 e have been several studies to enhance the energy efficiency of MPI librar
 ies and applications by lowering the CPU speed during busy-waiting or turn
 ing off CPU components. While most ...\n\n---------------------\nOptimizin
 g and Extending the Functionality of EXARL for Scalable Reinforcement Lear
 ning\n\nChenna, Cosburn, Ezeobi, Moraru, Lim...\n\nEasily eXtendable Archi
 tecture for Reinforcement Learning (EXARL) is a scalable reinforcement lea
 rning framework, part of the Exascale Computing Project (ECP) funded ExaLe
 arn program, designed to facilitate reinforcement learning (RL) research f
 or complex scientific environments. In RL, agents are a...\n\n------------
 ---------\nA Fast Parameter-Free Preconditioner for Structured Grid Proble
 ms\n\nkumar, aggarwal, Kakkar\n\nA fast, robust, parallel and parameter-fr
 ee version of a frequency-filtering preconditioner is proposed for linear 
 systems corresponding to the diffusion equation on a structured grid. Prop
 osed solver is faster than the state-of-the-art solvers.\n\n--------------
 -------\nMonitoring Urban Changes with Ensemble of Neural Networks and Dee
 p-Temporal Remote Sensing Data\n\nZitzlsberger, Podhoranyi, Svato&#328;, L
 azecký, Martinovi&#269;\n\nUrban change detection with remote sensing data
  covers a wide field of applications like understanding socio-economic imp
 acts, identifying new settlements, or analyzing trends of urban sprawl. It
  is used for decades. Analyses, however, are usually carried out manually 
 by selecting high-quality sampl...\n\n---------------------\nSimilarity Me
 asurement for Proxy Application Fidelity\n\nChen, Aaziz, Cook, Wildani\n\n
 Proxy applications, designed to represent similar but much larger and more
  complex parent applications, are widely used for system co-design and pro
 curement. Quantitative validation, however, to prove that proxies are fait
 hful representations of their parents, is missing. In this work, we apply 
 seve...\n\n---------------------\nHDF5 VOL Connector to Apache Arrow\n\nYe
 , Kougkas, Sun\n\nApache Arrow is widely used in Big Data Analysis and Clo
 ud Computing Area because of its standardized in-memory column format. It 
 is a columnar, in-memory data representation that enables analytical syste
 ms and data sources to exchange and process data in real-time. It could cr
 eate an in-memory colu...\n\n---------------------\nFusion Research Using 
 Azure A100 HPC instances\n\nSfiligoi, Candy, Subramanian\n\nFusion simulat
 ions have in the past required the use of leadership scale HPC resources t
 o produce advances in physics. One such package is CGYRO, a premier multi-
 scale plasma turbulence simulation code. CGYRO is a typical HPC applicatio
 n that would not fit into a single node, as it requires O(100 GB...\n\n---
 ------------------\nAccelerating the Visualization of Spatio-Temporal Simu
 lations with Non-Evolving Meshes\n\nRibes, Geay, Westphal\n\nNumerical sim
 ulations, such as finite elements, deal with both space and time discretiz
 ations of fields. For the discretization of space, a mesh is normally used
 . Concerning the temporal evolution of the spatial discretization, in the 
 general case, a different mesh can be used per time step. It is o...\n\n--
 -------------------\nPadding to Extend the Bruck Algorithm for Non-Uniform
  All-to-All Communication\n\nFan, Gilray, Kumar\n\nThe latency of the stan
 dard MPI_Alltoallv implementations is linear in the number of processes. S
 uch linear complexity performs poorly when applications are deployed on mi
 llions of cores for short messages, which is dominated by latency. Bruck's
  algorithm is a classic logarithm algorithm for uniform...\n\n------------
 ---------\nDetecting and Identifying Applications by Job Signatures\n\nLi,
  Cook, Chen\n\nKnowing the applications of jobs running in high-performanc
 e computing (HPC) systems is invaluable for administrators. This research 
 aims to detect and identify applications through job signatures built upon
  monitoring traces obtained from the LDMS monitoring infrastructure on Cor
 i. By constructing ...\n\n---------------------\nUtilizing Persistent Memo
 ry in Parallel I/O Libraries\n\nLogan, Lofstead, Levy, Widener, Sun...\n\n
 Scientific applications use parallel I/O (PIO) libraries to manage the com
 plexity of I/O to distributed storage efficiently. PIO libraries, however,
  have not adequately adapted to the emergence of persistent memory (PMEM),
  which provides comparable performance to DRAM. They interact with PMEM us
 ing ...\n\n---------------------\nFPGA-Accelerated Ripples\n\nNeff, Minuto
 li, Tumeo, Becchi\n\nInfluence Maximization is an important graph algorith
 m that is gaining traction in areas where social networks and other relate
 d graphs are processed and analyzed. The long runtime of the algorithm ope
 ned the door for optimizations, but is challenging to parallelize and port
  onto novel architecture ...\n\n---------------------\nHashed-Coordinate S
 torage of Sparse Tensors\n\nLowe, Charles, Singh\n\nTensors, or n-way arra
 ys, are becoming increasingly important in many fields. In recent applicat
 ions, tensors are extremely sparse and have such a high degree that dense 
 storage becomes intractable. To address this, several sparse storage forma
 ts have been proposed, with most formats being some vari...\n\n-----------
 ----------\nTowards Optimal Graph Coloring Using Rydberg Atoms\n\nVitali, 
 Viviani, Vercellino, Scarabosio, Scionti...\n\nQuantum mechanics is expect
 ed to revolutionize the computing landscape in the near future.  Among the
  many candidate technologies for building universal quantum computers, Ryd
 berg atoms-based systems stand out for being capable of performing both qu
 antum simulations and working as gate-based univers...\n\n----------------
 -----\nParallel GRB Source Localization Pipelines for the Advanced Particl
 e-Astrophysics Telescope\n\nSudvarg, Wheelock, Buhler, Buckley, Chen\n\nTh
 e Advanced Particle-astrophysics Telescope (APT) is a planned space-based 
 observatory to survey the entire sky for gamma-ray bursts (GRBs). It seeks
  to promptly detect these transient events, then communicate with narrow-b
 and instruments for follow-up observations. To this end, we are developing
  a...\n\n---------------------\nHardware Acceleration of Complex Machine L
 earning Models through Modern High-Level Synthesis\n\nCurzel, Tumeo, Ferra
 ndi\n\nMachine learning algorithms continue to receive significant attenti
 on from industry and research. As the models increase in complexity and ac
 curacy, their computational and memory demands also grow, pushing for more
  powerful, heterogeneous architectures; custom FPGA/ASIC accelerators are 
 often the b...\n\n---------------------\nCode Generation and Optimization 
 for Deep-Learning Computations on GPUs via Multi-Dimensional Homomorphisms
 \n\nSchulze, Rasch, Gorlatch\n\nWe present our work-in-progress code gener
 ation and optimization approach for DL computations based on the algebraic
  formalism of multi-dimensional homomorphisms (MDH). We show that popular 
 DL computations can be expressed in the MDH formalism, thereby exploiting 
 the already existing MDH GPU code ge...\n\n---------------------\nAccelera
 ting Parallel Monte Carlo Simulations for Statistical Physics: Portability
  on Many-Core Processors\n\nSattar, Li\n\nMonte Carlo (MC) simulations are
  important tools for studying thermodynamics and materials properties at f
 inite temperature. Scaling up to larger systems allows for simulations com
 parable to experimental length scale. In this work, we focus on improving 
 the weak scaling performance to simulate large...\n\n---------------------
 \nOpen-Source High-Performance Computing for Applications in Engineering: 
 DEM, SPH and Multi-Agent Vehicle Simulations with Project Chrono\n\nZhang,
  Hu, Benatti, Fang, Zhou...\n\nThis poster touches on three HPC-related re
 search thrusts based on Project Chrono. Chrono::GPU and Chrono::FSI levera
 ge GPU computing to accelerate large-scale granular dynamic simulations (b
 ased on DEM and SPH respectively), while SynChrono exploits MPI to support
  real time multi-agent autonomous v...\n\n\nRegistration Category: Tech Pr
 ogram Reg Pass, Exhibit Hall Only
END:VEVENT
END:VCALENDAR
