Early Career Lighting Talks - Processing NERSC Python user data on GPUs and sharing the results in a public dashboard
Event Type
Workshop
Online Only
Career Development
Diversity Equity Inclusion (DEI)
Education and Training and Outreach
HPC Community Collaboration
W
TimeSunday, 14 November 20213:45pm - 3:50pm CST
LocationOnline
DescriptionAt NERSC, we collect data about how our Python users are using our systems. Within the past year, we have upgraded our data collection to capture data from all of our Python users, not just those who were using the NERSC-provided Python modules, and performed new, more sophisticated analysis to gain better insight into how people use Python at NERSC. We published these results in R. Thomas, L. Stephey, B. Cook, A. Greiner. "Monitoring Scientific Python Usage on a Supercomputer." Proceedings of the 20th Python in Science Conf. (Scipy) in July 2021 (https://doi.org/10.25080/majora-1b6fd038-010). In this lightning talk I'll focus on the workflow we used for analyzing the data on GPUs, including Jupyter, Papermill, Voila, and the NVIDIA RAPIDS ecosystem (specifically with cuDF and CuPy). I'll discuss the main findings, including top Python libraries used, breakdown of Python jobsize, correlation of Python libraries, and share our public web-based dashboard of our results. I'll talk about the challenges we encountered with this workflow and our plans for future work.