Carnegie Mellon University

DataLab's EDM History

DataLab and DataShop, both part of the Simon Initiative, were born out of CMU, which has been at the forefront of the EDM movement. As the availability of data on learning grew with the development of educational technology and its adoption, CMU researchers were a few of the first to begin mining it to discover new best practices and opportunities in advancing educational theories. DataShop began as the tool they used to mine their data and publish their work; it was then opened to the rest of the educational community and became the largest repository for educational data. Now it is being expanded through DataLab.

Evolution of EDM and CMU

1983: The SOAR cognitive architecture and unified theory of cognition begins.

1995: The Center for Automated Learning and Discovery is formed.


2002: The Open Learning Initiative (OLI) begins at CMU, supported by the Hewlett Foundation.


2004: Paving the way to a data-driven understanding of robust learning, the Pittsburgh Science of Learning Center (PSLC) is established as a joint CMU-University of Pittsburgh initiative, with funding from the National Science Foundation.

2005: PSLC releases DataShop version 1.0 which is destined to become the world’s largest repository of educational data.

2010: Carnegie Mellon University Post-Doc, Dr. Ryan Baker forms Educational Data Mining society. The annual conference attracts hundreds of researchers from countries around the world.


2012: Study of students at six U.S. public universities shows that CMU's OLI statistics course (taught as combination of online and in class) is just as effective as regular lecture classes, showing the potential of interactive learning systems to maintain quality and reduce cost.


2012: PSLC has over 1600 publications. DataShop has more than 500,000 student hours of data.


2013: Launch of the Simon Initiative.


2014: Launch of DataLab.

Significant Publications on EDM

Pioneers included Ryan Baker, HCI PhD and original director of DataShop; John Stamper, HCI faculty and current director of DataShop; and Joe Beck, former CMU postdoc, along with others. A large fraction of published papers at EDM conferences also come from CMU. Since 2008, CMU has won a number of best paper awards at EDM, including in 2008, 2012, 2013.

Citations Notable Citations

  • Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J., (2010) A Data Repository for the EDM community: The PSLC DataShop. In Romero, C., Ventura, S., Pechenizkiy, M., Baker, R.S.J.d. (Eds.) Handbook of Educational Data Mining. , Boca Raton, FL: CRC Press. (128 References on Google Scholar as of 9/10/14)
  • Rau, M., Scheines, R., Aleven, V., and Rummel, N. (2013) Does Representational Understanding Enhance Fluency - Or Vice Versa? Searching for Mediation Models. In Proceedings of the 6th International Conference on Educational Data Mining (EDM 2013). Memphis, TN. Pages 161-168.
  • Koedinger, K., McLaughlin, E., Stamper, J., (2012) Automated Student Model Improvement. In Proceedings of the 5th International Conference on Educational Data Mining (EDM 2012). Chania, Greece. Jun 19-21, 2012. pp. 17-24. (28 References on Google Scholar as of 9/16/14)
  • Shih, B., Koedinger, K., and Scheines, R. (2008). A Response Time Model for Bottom-Out Hints as Worked Examples. In Baker, R.S.J.d., Barnes, T., Beck, J.E. (Eds.) Educational Data Mining 2008: 1‎st International Conference on Educational Data Mining, ‎Proceedings. Montreal, Quebec, Canada. June 20-21, 2008. Pages 117-126.
  • Martin, B., Mitrovic, T., Mathan, S., & Koedinger, K.R. (2011). Evaluating and improving adaptive educational systems with learning curves. User Modeling and User-Adapted Interaction: The Journal of Personalization Research (UMUAI), 21(3), 249-283. [2011 James Chen Annual Award for Best UMUAI Paper]