Carnegie Mellon University

What is Educational Data Mining (EDM)?

Educational Data Mining is about improving learning outcomes by mining and analyzing data collected as we teach. Just as in scientific and business fields of study, educational researchers see the potential to dramatically improve learning through this type of research. And it's become easier: in the past, collecting the kind of data that could inform best practices was an expensive endeavor, but it is now possible to collect tremendous amounts of data easily and efficiently.

How is the data being used?

The educational data mining community is using the large amounts of data to validate research findings at scale. It also helps predictions on student knowledge, dropout, and motivational state become much more accurate with additional data. By mining large amounts of data we gain a broader understanding of specific groups of students, which leads to better adaptivity and personalization for individuals.

What kind of data is being collected?

A wide variety of educational data is becoming increasingly available. Some of it comes from instructors’ efforts to record grades, others from school administrative systems, but more and more is being produced as a natural side effect of educational technology use.

The kinds of educational technology data being collected varies (along the left in Figure 1) from simpler to interpret data, such as clicks on menu items or structured symbolic expression, to harder to interpret data, such as free-form essays, discussion board dialogues or affect sensor data. Data is also collected at different time scales (see Time Scales axis along the bottom in Figure 1 below). For example, click actions are observed within seconds in fluency-oriented math games or in vocabulary practice; problem-solving steps are observed every 20 seconds or so in tools (e.g., spreadsheets, graphers, computer algebra) in intelligent tutoring systems for math and science; and answers to comprehension-monitoring questions are given and learning resource choices are made every 15 minutes or so in massive open online courses (MOOCs).

In other examples, lesson completion is observed across days in learning management systems, chapter/unit test results are collected after weeks, end-of-course completion and exam scores are collected after many months, degree completion occurs across years, and long-term human goals like landing a job and achieving a good income occur across lifetimes.


Why are people in the education community so excited by EDM?

People are excited about the potential of big data in education because they have seen the impact of data both in other scientific fields, but also in business. Imagine the potential for learning gains if schools relied on data about their students as much online retailers and social media sites do for their customers and users.

Another reason for excitement is the increasing availability of data on learning as educational technology is increasingly being widely adopted. The need for better evidence to guide educational programs is another important factor. Not only have we seen a push for greater accountability to student learning outcomes in K-12 education, critics of college-level education have also pressed on whether students are learning as much as they should. Policy-makers have been calling for evidence-based design and decision making in education so as to greatly improve practices and student outcomes. These are great opportunities for big data in education.

The Simon Initiative and CMU's Involvement in the EDM movement

CMU and the Simon Initiative has been and continues to be at the forefront of the EDM movement. Educational Data Mining was started by Ryan Baker, HCI PhD and original director of DataShop; John Stamper, HCI faculty and current director of DataShop; and Joe Beck, former CMU postdoc, along with others. A large fraction of published papers at EDM conferences also come from CMU. Since 2008, CMU has won a number of best paper awards at EDM, including in 2008, 2012, 2013.


Try DataLab's Gradebook Calculator

Using assignment, quiz and test scores from your gradebook, we can predict early in the term which of your students are at risk of not passing your course. Upload your data into our calcultor and see a quick report on your class. GO TO CALCULATOR.


Interested in diving into key EDM concepts?

We’ve created a collection of short tutorial videos to help give you some additional background. GO TO KEY CONCEPTS.