BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/Chicago
X-LIC-LOCATION:America/Chicago
BEGIN:DAYLIGHT
TZOFFSETFROM:-0600
TZOFFSETTO:-0500
TZNAME:CDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0500
TZOFFSETTO:-0600
TZNAME:CST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20211207T055409Z
LOCATION:Second Floor Atrium
DTSTART;TZID=America/Chicago:20211116T083000
DTEND;TZID=America/Chicago:20211116T170000
UID:submissions.supercomputing.org_SC21_sess241_spostu111@linklings.com
SUMMARY:Mitigating the Metadata Mess:  Autonomous Metadata Extraction Pipe
 lines for Large-Scale Data Repositories
DESCRIPTION:ACM Student Research Competition: Graduate Poster, ACM Student
  Research Competition: Undergraduate Poster, Posters\n\nMitigating the Met
 adata Mess:  Autonomous Metadata Extraction Pipelines for Large-Scale Data
  Repositories\n\nChen, Hsu\n\nMany scientific repositories are rendered us
 eless due to their enormous size (exceeding petabytes of data across billi
 ons of files) and lack of descriptive metadata to aid discovery, understan
 ding, and use. Building on a distributed metadata extraction service, Xtra
 ct, we propose a scheduler designed to optimize the amount of metadata ext
 racted from large scientific repositories subject to finite compute budget
 s. We accomplish this by leveraging machine learning models to predict the
  likelihood that each metadata extractor can retrieve nonempty metadata fr
 om each file. We then feed these probabilities along with other file attri
 butes into a scheduler that maximizes metadata yield over time. We demonst
 rate the viability of the scheduler on a real-world data repository and sh
 ow improved metadata extraction performance by measuring the metadata qual
 ity extracted as the scheduler passes through each file extractor pair.\n\
 nTag: In-Person Only\n\nRegistration Category: Tech Program Reg Pass, Exhi
 bit Hall Only
END:VEVENT
END:VCALENDAR
