Winning Research

Winning ResearchWinning Research

Chris Dyer

Die-hard football fans burn up the Twittersphere before the big game. All this excitement predicts a win — right? According to Carnegie Mellon University research, they shouldn't start celebrating — it's unfortunately just the opposite.

The researchers used automated tools to analyze streams of tweets through three NFL seasons, 2010–2012 — averaging up to 42 million a day. Plucking out messages with hashtags associated with individual NFL teams, they tested a number of factors for predictive power using a machine-learning algorithm.

"Our basic hypothesis was that fans, who pay close attention to the NFL, can reveal more than traditional statistics, such as passing or rushing yards," said Chris Dyer, assistant professor in CMU's Language Technologies Institute. "And Twitter gives us a convenient, 140-character-at-a-time perspective on what these fans are thinking."

"One thing we found is that, controlling for a few factors, if fans are tweeting a lot more about their team leading up to the game, they're probably going to lose. You might think people talk when they're confident, but something else might be going on, like nervousness."

Combining this result with other correlations, they also discovered that although they couldn't predict absolute winners or scores, they could, a little more than half the time, predict which teams would beat the point spread.

As can happen at a university committed to undergraduate research opportunity, this study began last year when Shiladitya Sinha (CS'13), then a senior majoring in mathematical sciences, approached Dyer looking for machine learning experience.

The Language Technologies Institute already had a broad research interest in the predictive possibilities of Twitter — they had 27 billion tweets archived — and Dyer suggested the NFL study.

Sinha readily agreed and completed the study with then-Ph.D. student Kevin Gimpel, Dyer and Noah Smith, associate professor of language technologies and machine learning. Sinha presented their findings at the Machine Learning and Data Mining for Sports Analytics conference in Prague, Czech Republic.

"I've had a number of undergraduates working with me since I came to CMU for my post-doctoral research," Dyer said. "I believe every single one of them has published a paper. They can really delve into and concentrate on a problem."

Students were the primary reason Dyer accepted a CMU faculty position.

"I could pretend that I was seriously considering other places but basically it wasn't even a question," Dyer said.

"It's the students. You can do anything here because the students are all so incredibly smart."

The Language Technologies Institute and Machine Learning Department are part of CMU's School of Computer Science.


Related Links: Read press release | School of Computer Science | Language Technologies Institute | Machine Learning Dept.