Sharing Rich Data Globally

A Data and Methods Repository for Education

The Simon DataLab will create an intellectual data commons to drive continuous improvement in student learning outcomes.

Drawing on the expertise and resources of university, industry, and government members, DataLab partners will collect and store hundreds of high-quality data sets and accumulate the best analytic methods available, thereby creating a large research community devoted to improving learning outcomes through empirical research. This will facilitate the creation, improvement, comparison and dissemination of the appropriate learning metrics, feedback mechanisms and best practices necessary to fuel the global learning revolution.

DataLab will support a new vision of interdisciplinary collaboration. Building on one of the largest repositories of learner-interaction data available, DataLab aims to become easily accessible to learning researchers and those who develop analytic methods well-suited to educational research.

Learning researchers can contribute data and benefit from the analytic methods developed by others. Course developers can test new methods on hundreds of datasets contributed by many others.

The Simon DataLab will ultimately grow a platform that provides increasing support and infrastructure to improve education through the interdisciplinary, collaborative use of big data.

What can I do with the Simon DataLab?

There are many choices that exist for course developers using the Simon DataLab. Click on a link below to see datasets and white papers providing further information. The datasets and papers are hosted on the Pittsburgh Science of Learning Center's DataShop website where you can find information about how to log-in and use the data.

Improve student learning in my system

There are many ways DataLab can help you analyze your dataset to try to discover ways you might improve student learning from your system. First, a simple strategy is to inspect learning curves to see if any are "low and flat", implying students are getting asked to do easy tasks repeatedly, potentially wasting their valuable learning time (see Cen et al., 2007).

A second, more sophisticated approach is to inspect your learning curves to identify opportunities for improving your knowledge component (KC) model. See Stamper et al. (2011) and watch either of these two videos. Koedinger et al. (2013) describes how an improved KC model was used to redesign a tutor and describes an experiment showing that students learn faster and better from this redesigned tutor than they do from the original tutor. Koedinger & McLaughlin (2010) provides a similar result, with both showing how KC model improvements can inspire the design of novel instructional tasks.

A third, automated approach is to employ Learning Factors Analysis (LFA; see Koedinger et al., 2012). If you would like us to apply LFA to your dataset, contact us. There are many other ways researchers have improved their systems and run experiments demonstrating that these improvements work. See the topic Test an instructional principle.

Dataset Paper [pdf]

Assistments Math 2008-2009 Symb-DFA (302 Students)

Seeing language learning inside the math: Cognitiv…

Chinese Radical Transfer Fall 2007

Using Optimally Selected Drill Practice to Train B…

Chinese Vocabulary Fall 2006

The FaCT (Fact and Concept Training) System: A new…

Cog Model Discovery Experiment Fall 2011

Using data-driven discovery of better student mode…

Cog Model Discovery Experiment Spring 2010

Human-machine student model discovery and improvem…

Cog Model Discovery Experiment Spring 2010 [KRM]

Human-machine student model discovery and improvem…

Contiguity CWCTC Spring 2006

Integrating visual and verbal knowledge during cla…

Geometry Area (1996-97)

Chania, Greece. Jun 19-21, 2012. pp. 17-24. …

Is Over Practice Necessary? — Improving Learning …

Geometry Area (1996-97) [LASI 13]

Human-machine student model discovery and improvem…

Joint Explanation - Electric Fields - Pitt - Spring 2007

Trialog: How Peer Collaboration Helps Remediate Er…

Shall we explain? Augmenting learning from intelli…

Middle School Gaming the System (2 schools, 4 lessons) 2002-2005 v1

Generalizing Detection of Gaming the System Across…

Detecting the Moment of Learning. Proceedings of t…

Developing a Generalizable Detector of When Studen…

Adapting to When Students Game an Intelligent Tuto…

Pittsburgh Science of Learning Center Stoichiometry Study 1

When is Assistance Helpful to Learning? Results i…

When and How Often Should Worked Examples be Given…

REAP ELI Reading 4 Fall 2006

Automatically Generating and Validating Reading-C…

REAP ELI Reading 4 Spring 2006

Language Learning: Challenges for Intelligent Tut…

Choosing Reading Passages for Vocabulary Learning…

REAP ELI Reading 4 Summer 2006

Self-Assessment in Vocabulary Tutoring. Ninth Int…

REAP ELI Reading Fall 2007

A Selection Strategy to Improve Cloze Question Qu…

Self Explanation - Electric Fields - USNA - Spring 2006

Explaining self-explaining: A contrast between con…

USNA Physics Fall 2008

The Andes physics tutoring system: Lessons Learned…

Test a theory of performance or learning

If, for example, you want to test whether a power law or exponential function better fits learning data, you might use DataLab data sets to do so as follows. You might export data from a dataset, e.g. Geometry Area, 1996-1997, open it into a software package like Matlab or R, and use programs for modeling, such as generalized linear regression, to compare alternate versions of your theory. You can find instructions on how to read an exported file into R here.

Dataset Paper [pdf]

Algebra I 2005-2006 (Hampton only)

Evaluating a simulated student using real students…

Assistments Math 2004-2005 (912 Students)

Why are algebra word problems difficult? Using tut…

Assistments Math 2005-2006 (3136 Students)

Why students engage in "gaming the system": Behavi…

The Composition Effect: Conjuntive or Compensatory…

Assistments Math 2008-2009 Symb-DFA (302 Students)

Seeing language learning inside the math: Cognitiv…

CMU 36-201: Statistical Reasoning and Practice - Spring 2007

The impact of spurious correlations on students' p…

Chinese Radical Transfer Fall 2007

Using Optimally Selected Drill Practice to Train B…

Chinese Vocabulary Fall 2006

Optimizing knowledge component learning using a dy…

The FaCT (Fact and Concept Training) System: A new…

Cog Model Discovery Experiment Spring 2010

Human-machine student model discovery and improvem…

Cog Model Discovery Experiment Spring 2010 [KRM]

Human-machine student model discovery and improvem…

Geometry Angles - Fox Chapel 1998

An effective metacognitive strategy: Learning by d…

Geometry Area (1996-97) [LASI 13]

Human-machine student model discovery and improvem…

Geometry Hampton 2005-2006

More Accurate Student Modeling Through Contextual …

Improving Contextual Models of Guessing and Slippi…

Middle School Gaming the System (2 schools, 4 lessons) 2002-2005 v1

Why Students Engage in "Gaming the System" Behavio…

Self Explanation - Electric Fields - USNA - Spring 2006

Self-explaining in the classroom: Learning curve e…

Analyze process data from an experiment

Many hypotheses on learning are tested through in vivo experimentation with data stored in DataLab. Within DataLab, users can create samples on subsets of data and compare different conditions within the data. When separate samples are created for experimental conditions, selecting them all will yield learning curves for each and performance profiler data charts for each.

You can see examples of the kinds of analyses that researchers have performed by clicking on the show related datasets and papers link below and reading one of those papers. For example, MacLaren et al. (2008) show results of analyzing process data to see if experimental conditions produce different patterns of hint requests (Table 5) or produce different amounts of example study or problem solving (Table 6). One way to do such an analysis is to export the dataset from the Export tab. You may want to export one of the smaller "rollup" exports, like the student-problem rollup or the student-step rollup, which give you higher level summary data. You can open the export in your favorite tool, such as R or Excel (e.g., use a pivot table with condition in the rows, Knowledge Component in columns, and average of hints in the cells).

Error rates, times, and hints can also be viewed by condition in learning curves or the performance profiler by creating samples for each condition and selecting those. An example dataset that has condition samples is Digital Games for Improving Number Sense - Study 1 (on the Learning Curve tab, inspect the two existing samples and try turning them on and off).

Dataset Paper [pdf]

Assistments Math 2008-2009 Symb-DFA (302 Students)

Seeing language learning inside the math: Cognitiv…

CPS Algebra I 2005

Evaluating collaborative extensions to the Cogniti…

Evaluating collaborative extensions to the Cogniti…

Scripting collaborative problem solving with the C…

Chinese Radical Transfer Fall 2007

Using Optimally Selected Drill Practice to Train B…

Chinese Vocabulary Fall 2006

The FaCT (Fact and Concept Training) System: A new…

Contiguity CWCTC Spring 2006

Integrating visual and verbal knowledge during cla…

Geometry Angles - Fox Chapel 1998

Pilot-testing a tutorial dialogue system that supp…

Limitations of student control: Do students know w…

Toward meta-cognitive tutoring: A model of help se…

Toward tutoring help seeking. In J.C. Lester, R.M….

An architecture to combine meta-cognitive and cogn…

Modeling students' metacognitive errors in two int…

 The help tutor: Does metacognitive feedback impro…

Can help seeking be tutored? Searching for the sec…

Towards computer-based tutoring of help-seeking sk…

A Response Time Model for Bottom-Out Hints as Work…

An effective metacognitive strategy: Learning by d…

Investigations into Help Seeking and Learning with…

Geometry Angles - North Hills Spring 2003

Evaluating the effectiveness of a tutorial dialogu…

Geometry Area (1996-97)

Is Over Practice Necessary? — Improving Learning …

Help Tutor CWCTC Spring 2006 (meta)

Can help seeking be tutored? Searching for the sec…

Middle School Gaming the System (2 schools, 4 lessons) 2002-2005 v1

Modeling students' metacognitive errors in two int…

Detecting the Moment of Learning. Proceedings of t…

Developing a Generalizable Detector of When Studen…

Pittsburgh Science of Learning Center Stoichiometry Study 1

When and How Often Should Worked Examples be Given…

Self Explanation - Electric Fields - USNA - Spring 2006

Self-explaining in the classroom: Learning curve e…

Explaining self-explaining: A contrast between con…

Test an instructional principle

The best way to test an instructional principle is to run a randomized controlled experiment with a control condition that does not employ the principle and an otherwise-identical treatment condition that does employ that principle. Many such studies have been run as illustrated in many of the associated papers listed below. One benefit of log data is that it provides information on process in addition to the outcome data present in post-tests. This data can enhance explanations of results (e.g., one potential benefit of worked examples is that students can process them faster than matched problems — do they? is it too fast?). If you are interested in a particular principle, a study may have already been done that you can use as a jumping off point.

Dataset Paper [pdf]

Chinese Radical Transfer Fall 2007

Using Optimally Selected Drill Practice to Train B…

Chinese Vocabulary Fall 2006

The FaCT (Fact and Concept Training) System: A new…

Contiguity CWCTC Spring 2006

Integrating visual and verbal knowledge during cla…

Motivation and Metacognition in Chinese Vocabulary Learning, Experiment 2

A Dynamical System Model of Microgenetic Changes i…

Pittsburgh Science of Learning Center Stoichiometry Study 1

When is Assistance Helpful to Learning? Results i…

When and How Often Should Worked Examples be Given…

Studying the effects of personalized language and …

Can a Polite Intelligent Tutoring System Lead to I…

REAP ELI Reading 4 Spring 2006

Choosing Reading Passages for Vocabulary Learning…

REAP ELI Reading Fall 2007

A Selection Strategy to Improve Cloze Question Qu…

Self Explanation - Electric Fields - USNA - Spring 2006

Explaining self-explaining: A contrast between con…

USNA Physics Fall 2006

Out of the Lab and into the Classroom: An Evaluati…

Test a theory of motivation

A number of researchers have found clever ways to detect student motivational or affective states from log data. See papers by Baker and associated datasets, below. Others have run experiments comparing different instructional treatments designed to enhance student engagement or motivation (e.g., see papers by McLaren et al). Many interesting open questions remain, for example, whether timing gaps in data are indications of thoughtfulness or disengagement.

Dataset Paper [pdf]

Algebra I 2005-2006 (Hampton only)

Differences Between Intelligent Tutor Lessons, and…

Educational Software Features that Encourage and D…

Assistments Math 2005-2006 (3136 Students)

Why students engage in "gaming the system": Behavi…

Middle School Gaming the System (2 schools, 4 lessons) 2002-2005 v1

Generalizing Detection of Gaming the System Across…

Modeling and Understanding Students' Off-Task Beha…

Why Students Engage in "Gaming the System" Behavio…

Developing a Generalizable Detector of When Studen…

Detecting Student Misuse of Intelligent Tutoring S…

Off-Task Behavior in the Cognitive Tutor Classroom…

Pittsburgh Science of Learning Center Stoichiometry Study 1

Studying the effects of personalized language and …

Can a Polite Intelligent Tutoring System Lead to I…

REAP ELI Reading 4 Spring 2006

Choosing Reading Passages for Vocabulary Learning…

Analyze data from another system to get ideas

Datasets provide examples of different kinds of activities and instructional methods. Analyzing data that is related to your interests (e.g., similar content or similar technology) may give you ideas for better instructional development. Similarly, analysis of datasets may inspire research ideas. Try out exploratory data analysis techniques on a dataset, including using DataLab tools like the Performance Profiler or the Error Report as well as exporting a dataset (transaction level is most detailed) and using your favorite tool(s) for exploratory data analysis (e.g., pivot tables in Excel).

Dataset Paper [pdf]

Middle School Gaming the System (2 schools, 4 lessons) 2002-2005 v1

Developing a Generalizable Detector of When Studen…

USNA Physics Fall 2008

The Andes physics tutoring system: Lessons Learned…