Carnegie Mellon University
October 02, 2014

Carnegie Mellon Leads New NSF Project Mining Educational Data To Improve Learning

Distributed Storage System Will Make Data More Accessible, Secure

Contact: Byron Spice  / 412-268-9068 /

PITTSBURGH—Carnegie Mellon University will lead a five-year, $5 million early implementation project sponsored by the National Science Foundation to improve educational outcomes and advance the science of learning by creating a large, distributed infrastructure called LearnSphere that will securely store data on how students learn.

By accessing more than 550 datasets generated from interactive tutoring systems, educational games and massively open online courses, or MOOCs, course developers and instructors will be able to improve teaching and learning through data-driven course design. Mining this educational data also will help researchers obtain deeper insights into how people learn.

Ken Koedinger, professor of human-computer interaction and psychology, will lead the project, which will include colleagues from CMU, MIT, Stanford University and the University of Memphis.

"We've seen the power that data has to improve performance in many fields, from medicine to movie recommendations," Koedinger said. "Educational data holds the same potential to guide the development of courses that enhance learning. Gathering more of this data also promises to give us a deeper understanding of the learning process."

LearnSphere received one of 14 data-driven research awards totaling more than $31 million announced today by the National Science Foundation under the Data Infrastructure Building Blocks (DIBBS) program. Now in its second year of funding, the awards support research in 22 states and touch on research topics in computer science and in every field of science supported by the NSF.

"NSF has an ambitious vision for advancing scientific frontiers through an enabling and collaborative data infrastructure," said Irene Qualters, NSF division director for advanced cyberinfrastructure. "We are particularly pleased that this year's DIBBs awards include this CMU-led project to build on the NSF-sponsored Pittsburgh Science of Learning Center's DataShop repository for educational researchers."

The Pittsburgh Science of Learning Center, created in 2004 and headed by Koedinger, studies how people learn and how learning can be made more robust. In the process, it created DataShop, the world's largest open educational data repository, which in turn spurred the rapidly expanding field of educational data mining.

Most of the information in DataShop is gleaned from heavily interactive systems, such as computerized tutoring systems and educational computer games, Koedinger noted. LearnSphere will add additional types of educational data, including information about student behavior and performance in MOOCs.

Koedinger said he hopes to increase the amount of data shared through LearnSphere by making it a distributed storage system, rather than a centralized system. Researchers will store their data on their own servers, enabling them to maintain the confidentiality of their subjects and exercise greater control over what elements of that data can be accessed by outsiders. This should motivate more companies and researchers to share their data, he explained.

"We're trying to create a culture in which scientists will not only be cited for their research findings, but also for their datasets," Koedinger said.

A key tool to be created in this project is a graphical interface that will enable users to combine data sources and analytic tools. The project researchers will be developing new methods for integrating data, for collecting data from MOOC experiments and other learning environments and for making automated discoveries from the data.

The researchers also will investigate how to recognize when students get off track or might dropout and will design effective interventions.

"Learning models based on the wide variety of datasets housed in LearnSphere will enable new forms of personalized, just-in-time support for learning," said co-investigator Carolyn Rosé, associate professor in the Language Technologies Institute and Human-Computer Interaction Institute at CMU.

Using learning science and technology to improve student learning is the focus of Carnegie Mellon's Simon Initiative. One element of that university-wide initiative, the Simon DataLab, was launched last year to make learner-interaction data more easily accessible to instructors. The infrastructure being created through the new LearnSphere promises to further enhance this effort.

John Stamper, director of DataShop, joins Koedinger and Rosé on CMU's LearnSphere team. Members of the team from MIT are providing expertise in MOOC datasets, Stanford will provide interactive learning data and MOOC data from the Online Learning Initiative, and Memphis researchers, along with Rosé, will contribute expertise on technology support for learning through discussion.

The Human-Computer Interaction Institute and Language Technologies Institute are part of Carnegie Mellon's top-ranked School of Computer Science, which is celebrating its 25th year. Follow the school on Twitter @SCSatCMU.