Consumption and Distribution of Data Sets in the Cloud
Cloud and Distributed Computing
TimeWednesday, 17 November 202111am - 11:30am CST
DescriptionResearchers, developers and educators use public data sets to explore and test hypotheses, teach and learn about data use, and develop new computational simulations and AI/ML algorithms. While vast amounts of public data are available through government-funded and other organizations globally, including existing cloud vendors, it is often difficult to access, manage, use and extract knowledge from the raw data. End users need cleaned, curated data and the tools and licenses to combine, distribute, re-use and collaborate around data sets.
In this talk we will present a new solution from Oracle for distributing large data sets within the scientific and developer community to facilitate domain-specific accelerated use of open data sets. Our platform aims to target key pain points from researchers, educators, students and developers who create, use and manipulate large data sets in their daily work. It will enable data producers to publish and share data easily by providing access to curated data sets. We will show examples of use and will discuss the challenges that were addressed to reduce friction on data distribution. We will finish the presentation with a roadmap and a discussion of some of the challenges and open research problems that still need to be addressed in consuming, sharing and evolving data in the cloud.