SC21 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

TributaryPCA: Distributed, Streaming PCA for In Situ Dimension Reduction with Application to Space Weather Simulations


Workshop:DRBSD-7: The 7th International Workshop on Data Analysis and Reduction for Big Scientific Data

Authors: Yu Wang (University of Michigan) and Natalie Klein, Steven Morley, Vania Jordanova, Michael Henderson, Ayan Biswas, and Earl Lawrence (Los Alamos National Laboratory)


Abstract: Computer simulations continue to grow in size and complexity and are moving towards exascale. Simulations at this scale can generate outputs that exceed both storage capacity and the bandwidth available for transferring to storage, making traditional offline statistical inference challenging. Therefore, it is desirable to embed statistical analyses in the simulation framework while the simulation is running -- a strategy called in situ inference -- to alleviate the burden of storage. In this work, we focus on adapting Principal Component Analysis (PCA) -- a statistical method for reducing dimensionality of big data -- to the in situ setting. We develop TributaryPCA: a distributed version of Oja's algorithm for streaming PCA that uses the Message Passing Interface (MPI) standard. Our approach significantly reduces data storage requirements of offline PCA and avoids excessive communication across compute nodes. We illustrate the method using data generated from the SHIELDS Framework for space weather simulation.





Back to DRBSD-7: The 7th International Workshop on Data Analysis and Reduction for Big Scientific Data Archive Listing



Back to Full Workshop Archive Listing