Carnegie Mellon University

Data Science Cup

February 24, 2017

Eyes on the Prize: Tartan Data Science Cup

By Ann Lyon Ritchie

The Tartan Data Science Cup (TDSC) lets students try their hand at being a data scientist.

The Department of Statistics' third installment of the competition teamed up students to solve a real world data analysis problem under a tight deadline—all for a little notoriety. The names of the winning team members were engraved on the trophy for all to see.

Capital One's Center for Machine Learning sponsored this year's event, awarding prizes of $50 Amazon gift cards to each member of the top teams and also an Apple TV for each member of the first place team. Employees also participated as judges.

Statistics Professors Rebecca Nugent and Sam Ventura organized the event.

"We were excited to partner with Capital One's Center for Machine Learning. They have been long champions of CMU programs and work on a wide array of incredibly interesting data science problems. Their insights and experience were invaluable in discussions with our students," Nugent said.

The data set included detailed information about thousands of loans including their repayment status. Teams were challenged to create a model that could predict if customers would default on their loans.

"Our philosophy when training students in statistics and data science includes ensuring that students have all the skills necessary for success. All episodes of the TDSC involve data manipulation and computing, visualization, statistical modeling, written communication and oral presentation. The winning teams showed breadth in all of these areas, rather than excelling in one individual area," Ventura said.

Kweonwoo Jung, a senior in mathematical sciences, called his team's second place finish "very refreshing," adding he had competed well in similar contests online through Kaggle and Numerai but had yet to place at the top.

"I was happy to see our model actually performed well in practice. Being able to present our work to CMU's statistics department and employees of Capital One was also a precious experience," Jung said.

Some participants had vied for the Cup in prior years. Lina Sheremet, a senior in statistics, returned with her team and won "best data visualization" for the second year in a row.

She praised the educational benefits of the event, including the opportunity to analyze a realistic data set, to meet and talk with the Capital One employees and to work under a time restraint, similar to how a data scientist would work on the job.

Sheremet's team used Tableau Software to make a map of customer data, color-coded based on the probability that they would default on their loan, given certain characteristics in the dataset.

"As statistics majors, our team members have learned that data tells a story. The data science cup allows us to find our own interpretation of that story with many resources for help and support," Sheremet said. "Too bad two-thirds of our team is graduating this spring, or we would continue participating next year!"

View phots from the event

Top Teams:

First Place: Sally McNichols, Andy Liu for "paranormal"
Second Place: Kweonwoo Jung, Maria Rodriguez De La Cruz, Nikita Gupta for "Anything Random?"
Best Data Visualization: Lina Sheremet, Joey Gibli, LéShaun Jones for "Bae's theorem"
Best Technical Writing: Christopher Morris, Grace Yu, Michael You for "Mitophace"