Introducing the Students@SC Data Science Competition
As Principal Member of Technical Staff at Sandia National Labs, I work on large scale data management related issues and work with people from around the world. I am the Students@SC Chair this year, home of this new program within the overall SC organization. My first SC was 2006 as a grad student and I have attended every year since. At SC17, Michela Taufer asked me to work with the lead student volunteers (now SCALE) for SC19, pulling me into the planning and organization side for Students@SC where I have worked each year since.
Why did we create this new competition?
There were two motivations. The first is that data science has been incorporated into SC for several years now, but it hasn’t been represented well within the Students@SC program. It is increasingly clear that no matter what students get involved in, they will have to analyze large data sets to learn something. This is true for humanities, social sciences, physical sciences, and clearly many commercial applications. Finding a good way to bring a stronger program for students in this area would both bolster the Students@SC program with a broader audience we have not been as focused on and to better help emphasize the rightful place of Data Science at SC.
Second, when I attended the CHPC annual conference in December 2019, I saw the Student Datathon Challenge and realized that that was the answer. The students were given a single data set that all teams had to find answers in. Then they had to find a second data set that had something to do with South Africa (their home country) and find an answer there. Seeing this, I knew I had to bring it to SC. Unfortunately, DIRISA has not been able to participate with the SC version of the event due to covid impact in South Africa. The two programs that already have data science components are Computing4Change and HPC in the City. They are focused on first exposure students. Once the students have participated in these programs, they have no further engagement with SC without making a concerted effort themselves. The Data Science Competition offers a way for the alumni of these programs to stay involved for another year and bring them more strongly into the Students@SC program.
When will it be held?
We will hold the Data Science Competition immediately before the SC conference and are aiming to bring the competitors to SC to attend the full conference with partial support. Should they wish to have a more integrated experience, they could also apply to be a student volunteer without any schedule conflicts with the competition. In this way, we are looking to incorporate a new group of students into the SC community.
How will it be structured?
The currently proposed structure of the Data Science Competition is similar to the DIRISA event, but split across nine days. This will enable undergraduate students to do the substantial work on the weekends and then spread the rest of the slower work across the weekdays. Consequently, the competition should not interfere with their studies, and it is less intense than the original 4- or 5-day event. The first weekend will be a fixed data set for all teams to work on. Over the subsequent weekdays, the teams have to find a data set related to the region in which SC is being held, St. Louis, and gain approval by Friday afternoon. Over the next 2 days, the teams will solve a data science question and build their presentations about what they have learned and the impacts. The teams will be judged based on these presentations with the results shared either on GitHub and/or via a journal paper. The specific structure is still being finalized and this is absolutely subject to change as we work out the details of what is possible given available resources and the number of participating teams.
Who is chairing this new program?
Kristen Brown was chosen as the chair for this competition based on her interest, alignment with her day job, and for her several years of exceptional performance as a volunteer for Students@SC. With Kristen at the helm, I am confident that the competition will be successful. She has assembled a small team of dedicated experts to help her make this first event one to remember.
I started attending SC as a student volunteer in 2016 after one of my HPC & scientific visualization professors, Bruce Loftis recommended me for the program. I continued as a lead student volunteer (known as the SCALE program today) and eventually joined the planning committee. As a data scientist, I was excited to learn from Jay Lofstead that there was interest in adding this new program and bringing more students interested in the field into the HPC community.
What is this new program all about?
With the increasing amount of data science focused HPC workloads, we’re interested in expanding the opportunities for students to learn about these kinds of projects and building additional educational paths into the SC programs. We’re introducing a new, remote data science competition for teams to compete in before the conference this year as part of the Students@SC program. The fact that the competition will be held before the conference will enable the competitors to fully participate and attend SC21, and we intend to involve them in broader participation of the SC events as well. Finally, we also want to continue engaging with students interested in solving problems that are important to them and within the local community.
What will participants need to expect?
Participants will need to form their teams (4-5 students) ahead of time, be prepared to spend some time learning about relevant topics for the competition, and to think deeply about the kinds of problems they’d like to approach using data. Prior to the competition they will have the opportunity to attend various webinars and workshops covering various skills and tools they’ll need to be successful. Participants should expect to set aside two weekends before SC21 to participate in each phase of the competition. Teams will first address the same dataset and problem space, followed by independent projects. We’ll encourage teams to focus on problems relevant to St. Louis this year, but the categories they can explore are broad, ranging from areas such as education, climate or healthcare problems. More details about this will be announced soon.
In the end, teams will present their solutions to be judged in both phases and overall winners will be announced during the Students@SC awards ceremony. Their work will also be shared publicly to showcase the different solutions and problems each team focused on.
How can I apply, what is the deadline, and what qualifications are needed?
Applications are tentatively set to open mid-April via the SC Submissions Site (view a sample form). One team member should fill out the form for their team. The deadline to apply will be early August. We expect teams of 4-5 students will apply together and have some exposure to data science or related topics before joining, though they don’t have to be experts. There are no requirements for area of study and we encourage teams to come in with a mixture of backgrounds to run a successful project. We’re opening the competition to undergraduate students and first-year graduate students that must be 18+ at the time of the competition.
What “side events” will be available to participants?
We’ll be hosting social activities for the students so they can interact with other teams and other programs happening before and during SC. Prior to the event, students will be attending workshops and webinars to learn new skills and fill knowledge gaps to prepare for the competition. We will encourage students to participate in the Students@SC programming, mentorship and networking activities. By having the competition ahead of the conference, students can participate in both activities fully to engage with the broader SC community. They can even apply to be a part of other programs, like the Student Volunteers.
As a leader what do you look forward to?
I’m really excited to see what kinds of problems the teams will be drawn to and how they approach them. I’m always amazed by the way students approach new problems in this field. I’m also looking forward to engaging with the local community in St. Louis and continuing to bring wider communities of students into the world of HPC. We hope that the students will be excited about careers in computing fields and within their local communities by participating in any one of our Student programs.
Christine Baissac-Hayden, SC21 Students@SC Communications Liaison
Christine Baissac-Hayden created Easy English 4 All, which provides multilingual communication tools for clients from diverse backgrounds in the renewable energy, medical, defense, marine science, and film industries. Easy English 4 All provides English as a Second Language (ESL), French, Spanish and Japanese tutoring from certified native-speaking teachers and organizes international student exchanges with personalized objectives and goals.