March 18, 2016
Statistics Department Hosts First Tartan Data Science Cup
By Emily Stimmel
Computer programmers have hackathons. The machine learning world has Kaggle competitions. And now, Carnegie Mellon University’s budding statisticians have a competition to call their own.
Professors Sam Ventura and Rebecca Nugent organized the event series as a way to showcase the department’s strengths in data science. Students were tasked with creating elegant data visualizations, selecting appropriate statistical methods to apply to a problem, collaborating with peers across disciplines and communicating their results to a broad audience.
“We wanted to host a series of events that would allow our students to not only showcase their impressive data analysis skills, but their creativity in collaborating to solve real-world problems,” said Ventura, visiting assistant professor of statistics. “We find that our students gain the most valuable experiences when solving real-world problems by analyzing large, complex datasets.”
For the debut “episode,” over 100 students representing each of CMU’s undergraduate colleges used data on the New York City bike share system, Citi Bike NYC, to determine where two new bicycle stations should be added.
Students implemented a wide variety of solutions, including geospatial mapping techniques and pattern recognition. The winning team, Real Distributions Have Curves, used features like trip duration, start and end time and the birth year of the rider to predict gender of non-subscribers. The team used this information to recommend station placement that would provide the largest increase in female ridership.
According to Nugent, teaching professor of statistics and co-director of undergraduate studies, the students’ diverse solutions were impressive, as well as their varied academic backgrounds and experience levels.
“The winning team had two freshmen and a sophomore, so it’s anyone’s game!” she said.
Apoorva Havanur, a sophomore majoring in statistics and machine learning, was a part of the winning team. Havanur enjoyed applying programming and statistics techniques outside the classroom.
“Our team had really good chemistry and it was fun to and bounce ideas off of one another throughout the day,” said Havanur. “It was also encouraging to know that freshmen and sophomores could compete against upperclassmen and still do really well with enough hard work.”
Entries were judged by David White (HNZ’08), executive director of Pittsburgh Bike Share, Christopher Genovese, head of the Statistics Department and Ryan Tibshirani, assistant professor of statistics.
The next TDSC episode will take place in early fall 2016 in conjunction with the Technical Opportunities Conference (TOC) and Business Opportunities Conference (BOC) and will have company sponsorship with networking and recruiting opportunities.