Carnegie Mellon University
May 07, 2020

Making Data Lemonade

COVID-19 Pandemic, Isolation an Opportunity for Pittsburgh-Area High Schoolers to Learn Data Science

Jocelyn Duffy
  • MCS Associate Dean of Communications

Lockdown status in each state versus number of COVID-19 cases, March 20-29, 2020. Visual shows the date of lockdown in each state by turning gray. The numbers on top of the states indicate the number of new cases from that state that day. Credit: Breanna Franchak, Pine Richland High School, and Cassidy Power, University of Pittsburgh.

It begins with a data-driven discussion of the national plan to reopen safely. Brian Macdonald, director of data science at a leading healthcare data analysis company, reviews the plan to phase out the quarantine based on continuous two-week decreases in COVID-19 cases in a given state.

His audience is not who you might expect. Listening in, via teleconference, are about 20 students from high schools across the Pittsburgh area.

The working groups, made up of high schoolers and their undergraduate mentors from the University of Pittsburgh (Pitt), present their latest findings. An interactive map of the U.S. shows how reduced travel in each state correlates with the number of new cases. The correlation isn’t great; some states with more isolation are seeing increased cases. “We probably should have taken under consideration population density,” says teammate Breanna Franchak of Pine-Richland High School.

A graph from another working group shows the daily number of cases in each state after reaching 100 confirmed cases. They start there so they can get an apples-to-apples comparison of how the pandemic is growing regardless of what date the virus entered a given area.

“There are different rates of change,” says Oliver Yao, a Pitt student acting as a mentor to the high schoolers. “It shows that there are different factors other than time; some areas are managing growth … better than others.”

Students enter the conversation to discuss and debate these factors. Which are important? Which affect the others?

These COVID-19 conference calls are part of a unique introduction to data science sponsored by Pitt, Carnegie Mellon University, the Pittsburgh Supercomputing Center, and area data-based companies. It is also an exercise in turning lemons into lemonade.

The Pittsburgh Data

Jam The student COVID projects grew out of an ongoing program called the Pittsburgh Data Jam. Begun in the fall of 2013, the Data Jam was a collaborative project of Pittsburgh-area universities, companies and high schools.

Saman Haqqi, then an IBM data scientist and chair of an academic-corporate alliance called Pittsburgh DataWorks, began the project as a flagship effort of DataWorks along with Raja Sooriamurthi, a professor at Carnegie Mellon University who specialized in data education, Cheryl Begandy, outreach director for PSC and Macdonald.

The idea was to have an extracurricular education effort on data science for high school teachers, including site visits to data-oriented businesses and higher education institutions in the region for their students and an informal competition in data projects for teams of high schoolers.

“In the third year or so, we started running workshops for the students,” Sooriamurthi says. “The Pitt mentors took a pivotal role in that.”

Judy Cameron, a professor in the University of Pittsburgh Department of Psychology, began running an outreach effort at Pitt to recruit these college-student mentors.

“We give introductory talks to students in high schools around the general Pittsburgh area in the fall,” she says. “We show them how important Big Data is for absolutely everything—not just for science, and not just for business.”

Topics of these talks include how to get involved with the Data Jam, how to pick a research question, how to access data and get it organized and how to visualize data to make sense of it.

Oliver Yao, a University of Pittsburgh undergraduate studying economics, has been one of those student mentors.

“It’s really driven by the high school students,” Yao says. “We’re just giving them ideas … my job, I feel, is to provide them with a safe space to fail. To try things and say, ‘Oh, that didn’t work out,’ but it’s OK. It’s a balance; we try to let them have their own room to discover.”

“It’s inspiring to work with them just because a lot of them aren’t 100 percent sure what they want to do yet,” says Alexa Spaventa, a mentor and Pitt student in the Computer Science and Digital Narrative program. “It’s made me more passionate about what I enjoy in computational science, because I get to share it with them.”

The COVID Curve Ball

Back to lemonade.

The Data Jam had been on a roll, growing from a handful of schools to over 20 in recent years. The 2020 Data Jam was on track for final student presentations in early April. Then Coronavirus threw everyone a curve ball.

“We didn’t think it was possible to continue with the original Data Jam,” says Macdonald. But an inspiration came to him when he realized how bored his own daughter was, confined to their house.

“I was just thinking about Data Jam, doing my own analysis of COVID as more data became available,” he says. “I had a daughter in high school at home with all this free time .... She couldn’t go out to socialize. I thought this might be an interesting project for the students to work on. It’s relevant, and you feel a bit of control as you see how the virus spreads, effects on it, how long it might last. I asked the Data Jam advisory board, and they said, ‘Yes.’ Within a week, we were getting the students involved.”

Because of problems with access, a competition was out. But a less formal program could allow students from schools that hadn’t been involved with Data Jam to participate. So the Data Jam organizers broadcast a call to students in the area.

“We were very clear when we sent out the messages to teachers that it was completely separate from Data Jam,” he says. “No requirements, no teams, anybody could join. In many ways, it was a free-for-all.”

One of the students who answered was Amber Murphy of Pine-Richland High School, which had not participated in the Data Jam.

Murphy was a natural for the program. She’d already had an introduction to data science through her science fair project which focused on searching for correlations between the occurrence of medical conditions such as dental disease or hypertension in people with HIV, comparing those numbers to those in people without the virus.

Learning the Ropes of Data Science

In keeping with the goals of the original Data Jam, the COVID student projects are giving high school students a taste of data science, how it works and whether it might be something they’d want to consider for a future career.

Working with Oliver Yao as her student mentor, Murphy and the rest of her group have looked at a number of factors in the COVID pandemic, including the economic aspect. They’re currently working on a possible survey of local businesses to see which have tried to access federal aid money.

Tony Robol, a student at Peters Township High School—which also had not been participating in the Data Jam—tried his hand at creating a mathematical model that might help predict the course of the COVID pandemic in Pennsylvania.

Using state data on COVID-19 cases and statistics on hospitalization and number of hospital beds in the state, he found that a relatively simple equation tracked very well with the number of cases as of April 21, and predicted that the number of COVID hospitalizations would peak at just under 4,000—well under the available 13,500 beds.

Importantly, though, Robol’s analysis included a number of caveats. Would the equation he’d picked continue to be accurate? Does the fact that, unlike in his model, the number of cases have spiked recently in Pennsylvania, further call the prediction into question?

“The spread of Coronavirus is very complex,” Robol says, with many possible factors affecting it. “Obviously we want to prevent cases, prevent deaths in the long run, but the problem with flattening the curve is its effects on society, its economic effects, its mental health effects.”

“Starting with something basic can lead you to factors that initially you might not have thought about,” Pine-Richland student Murphy says. “It leads to a radically different outcome than you would have thought.”

Learning how to try things out, recognize when something isn’t working, and decide whether to keep trying or move on to something better are all part of the lesson, Robol adds. “When you hit a bump in the road, you have to either go around it or go through it.”

Coping—and Sticking to the Mission

Robol’s advice applies to the projects’ role in the students’ lives as well as the lessons learned.

“This project is definitely helping with the stay-at-home situation,” he says. “We just started back at school [remotely], and with the COVID projects I’ve been busier than I have been at school. It’s helping me stay engaged, and not bored.”

The Data Jam organizers have not given up on their central mission of bringing the potential of data science to high school students.

“When I presented at an international conference, a number of people from Europe were quite impressed we were able to do this, and not as just a weekend effort,” Sooriamurthi says. “One of the things that caught many people’s eyes was how many young women were involved.”

He hopes that, in addition to reaching out to students with less in-home access again once the COVID pandemic is over, the involvement of new students at more schools may prove another opportunity for future Data Jams to grow. Eventually, high schools will have data-based curricula that can provide this opportunity to all students.

“We’ll know when we’ve succeeded when we no longer need Data Jam,” Sooriamurthi says.