Carnegie Mellon University

Key Concepts to Understand

There are a variety of approaches to assess, interpret, predict and model student knowledge. DataLab and DataShop tools were developed using an evidence- and science-based approach to learning and course design. This means that:

  • Every course or instructional activity can be considered a hypothesis on how students can best learn the related outcomes.
  • We can use the data to test and refine these hypotheses and improve the related learning activities.

Before exploring the tools, it can be helpful to understand a few key concepts you’ll come across as you use them. Different tools will make use of the approach described above in different ways. And the extent and quality of the evidence that’s available will have an influence on what tools can be used (and how much you can trust the answers that they offer). But this underlying notion — of learning as a testable hypothesis — is a common thread across all of the concepts below.

Basic Concepts

Modeling Skills and Knowledge

One key to the Carnegie Mellon approach is our understanding of learning as a process that isn’t directly observable – because learning occurs inside a student’s brain, we can’t actually see it. Instead we think about this as a set of things that go into the learning process – prior knowledge, motivation and the instructional activities in which the student engages—and the observable and measurable outcomes that a student achieves when learning has occurred--the outputs.

When we think about learning in this way, it’s clear that having a model for what’s happening in the learner’s mind can be helpful or even necessary for us to refine our understanding of how learning works and to improve our instructional materials. 

Knowledge Components (KC) represent one approach to creating this type of cognitive mode. KCs are a flexible way of relating or breaking down problems and learning activities into the components that are being practiced and acquired. Different theories might consider these components in different ways; they might be skills, principles, concepts, outcomes or schemas depending on who is doing the modeling. KC’s offer a generalized way to accommodate these different approaches while still using common DataLab and DataShop tools to test out hypotheses on how students can best acquire specific KCs, refine our models and improve the related activities.

Learning activities can have multiple KCs associated with them, and the initial models can be created in number of ways ranging from a very intuitive approach, to more formal methods like cognitive task analysis, to more experimental mechanisms, such as using data mining to automate model discovery. KCs can vary greatly across domains and in complexity. Although it can be possible to continue to break down KCs into ever smaller component parts, it’s not necessary (or advisable) to try to continue decomposing into the most atomic components. Identifying appropriate grain size should relate to the course or activities that are being analyzed, and should target new KCs, rather than ones in which students are already fluent.

Visualizing Learning

One of the major challenges to measuring and improving learning is that we can’t directly observe it. Because learning is something that happens inside student’s brain, it’s not something that we’re able to see. And so we’re normally forced to rely on proxies for measuring learning—grades, exam scores, attendance.

But tools like grades serve multiple purposes, including credentialing and providing an incentive structure to encourage better study behaviors, that can be at odds with their usefulness as a learning measure. At the same time, grades and scores are often too high-level to offer meaningful feedback for improving instruction. And when considered as a learning measure, it’s not always clear what’s being measured; this can become especially clear when learning activities are mapped to outcomes and knowledge components. What does it mean to achieve 80% of an outcome? 

To make progress in improving instruction, we need to think about learning in a different and more measurable way. One way to do this is to think about learning as the gaining of expertise over time. As a novice, the first time that you attempt to practice a skill, you expect to make a lot of mistakes, and possibly need some help to get it right. As you get more experience, you should make fewer errors and need less help, until eventually you’re able to consistently apply the skill correctly.

Luckily, errors and help are something that we can measure, and when we chart them out against a group of student’s attempts at solving problems, we end up with a very useful tool for visualizing learning. Called a learning curve, the tool shows the amount of assistance that student’s required in solving problems for a specific knowledge component (or set of knowledge components). In this example we’re looking at the log data from students’ performance on a set of problems that were designed to teach the knowledge component “decide when to use and calculate mean and median”. As you can see, the first time that students encounter this knowledge component, their assistance score – a combination of errors and hints – is high. But with each subsequent attempt, students need less assistance. This is an ideal learning curve (indeed, an almost perfect learning curve), indicating that learning is occurring, that activities in question are effective and suggesting that underlying model that was used in creating these activities is sound.

Unfortunately, most learning curves don’t look like this the first time out of the gate—instead, they’ll often look like this. Or this. Or even this. Fortunately, this is where learning curves really show their value, because these visualizations provide great information on where you can make improvements, either in making changes to activities or by refining your learner models.

A confused learning curve, like this one, might suggest that what you originally thought was a single knowledge component was actually two distinct ones; by decomposing into two separate KCs, we might end up with a learning model that makes more sense.

Or, we sometimes see a blip in an otherwise smooth learning curve, like this. This often suggests that a new (and perhaps hidden) skill was introduced in the middle of the problem sequence—a likely candidate for additional scaffolding or practice in the activities.

We see other, more high-level patterns in learning curves that we can use to categorize and guide our improvements. Low and flat curves suggest that students are being given too much practice for skills that they’ve already mastered – not a good use of their time! Curves that are high and flat, in contrast, don’t indicate that learning is occurring – these are often areas where models need to be refined. Learning curves that are still high suggest that students are having difficulty with the knowledge components; these are usually good places to add additional practice problems. Finally, for some learning curves we find that we have too little data to interpret; these indicate areas where knowledge components might be merged, or where additional activities would be useful.

Beyond the benefit of providing guidance on where instruction and activities can be improved, learning outcomes can also help consider how successful you’ve been in improving the course. By comparing learning curves between different versions of your activities or courses, you can measure how effective your improvements have been.

While learning curves are very effective, they are most effective in well instrumented courses that capture learner interaction data. DataShop provides some guidance on importing this kind of log data, but there are a number of technologies for creating learning activities that already include this functionality, especially CTAT tutors and OLI Courses.

Advanced Concepts

Learning Curve

Learning curves track students’ progress from novice to expert. They map the amount of errors and requests for help students make while solving problems to gain or understand a specific knowledge component (a skill, principle, concept, etc.).

Bayesian Knowledge Tracing

Bayesian Knowledge Tracing tracks students’ behaviors to make predictions about the students’ learning state. It goes beyond measuring right or wrong answers to using other factors to make inferences. Those factors include the students’ past success at similar problems, the likelihood that a student is simply guessing, the difficulty of the skill to be mastered and others.

Additive Factors Model

Additive Factor Modeling (AFM) is a method of modeling learning and performance. AFM is a statistical approach that combines logistic regression with some aspects of item response theory, allowing us to predict student error rates for a given set of learning activities for different KC models. This approach is able to take into account the difficulty of specific activities, students’ current knowledge level and the speed that students acquire specific skills.

Ready to get started?

Designed specifically for instructors and educators, DataLab’s Gradebook Calculator can help you identify issues and adjust your instruction to improve your students’ learning outcomes. LEARN MORE.