June 30, 2019
The Limits of Big Data
By Jason MadererMedia Inquiries
- Marketing and Communications
When Pedro Ferreira walks onto the stage in Dalian, China, at the World Economic Forum's 2019 Meeting of New Champions, a portion of his thoughts will be on the beach. And not because the host city overlooks the Yellow Sea.
Ferreira will give advice to his fellow conference attendees: just because big data is becoming increasingly available and useful, they must recognize its limits and not be naïve to cause and effect. To illustrate his point, he cites a classic example.
"If you only pay attention to the data, you'll notice that when ice cream sales at the beach increase, so do the number of drownings," said Ferreira, an associate professor of information systems in the Heinz College of Information Systems and Public Policy. "When ice cream sales decrease, less people drown. The data are correct, but we know one thing has nothing to do with the other."
And that's the point Ferreira will demonstrate in Dalian. Simply obtaining more data isn't helpful to find cause and effect. In the example above, temperature affects ice cream sales and beach attendance simultaneously. On hotter days, more cones are indeed sold. But more importantly, hot days bring more people into the water, and that's what causes more drownings.
"Many people across many organizations, both public and private, believe that more data addresses most empirical problems. It's simple not true," said Ferreira, who also holds a joint appointment in the Department of Engineering and Public Policy. "In fact, big data costs more to collect, curate and analyze, and is likely to provide misleading results unless they are carefully manipulated."
Ferreira will urge attendees, especially policy makers and business leaders, to create randomized experiments to find causal effects. Once cause and effect are correctly identified using data from such experiments, leaders will have the proper knowledge to realize their objectives. This idea applies to both complex and simple situations, such as finding the impact of Internet in school on student learning and the effect of price on sales, two examples that Ferreira will discuss in Dalian.
"This event is a great opportunity to talk about fundamental concepts of data science, which everyone is doing but not always doing the right way," he said. "It's exciting that Carnegie Mellon has a global platform to talk about the need for experimentation in the era of big data."
Ferreira's primary research focuses on how people use technology to consume experience goods and influence others to do so. The fields are closely linked to how firms behave and how public policies affect market structures. His recent studies include insight on television binge-watching and how time-shift TV technology affects viewers' habits.