The Simon DataLab will create an intellectual data commons to drive continuous improvement in student learning outcomes.
Drawing on the expertise and resources of university, industry, and government members, DataLab partners will collect and store hundreds of high-quality data sets and accumulate the best analytic methods available, thereby creating a large research community devoted to improving learning outcomes through empirical research. This will facilitate the creation, improvement, comparison and dissemination of the appropriate learning metrics, feedback mechanisms and best practices necessary to fuel the global learning revolution.
DataLab will support a new vision of interdisciplinary collaboration. Building on one of the largest repositories of learner-interaction data available, DataLab aims to become easily accessible to learning researchers and those who develop analytic methods well-suited to educational research.
Learning researchers can contribute data and benefit from the analytic methods developed by others. Course developers can test new methods on hundreds of datasets contributed by many others.
The Simon DataLab will ultimately grow a platform that provides increasing support and infrastructure to improve education through the interdisciplinary, collaborative use of big data.
There are many choices that exist for course developers using the Simon DataLab. Click on a link below to see datasets and white papers providing further information. The datasets and papers are hosted on the Pittsburgh Science of Learning Center's DataShop website where you can find information about how to log-in and use the data.
There are many ways DataLab can help you analyze your dataset to try to discover ways you might improve student learning from your system. First, a simple strategy is to inspect learning curves to see if any are "low and flat", implying students are getting asked to do easy tasks repeatedly, potentially wasting their valuable learning time (see Cen et al., 2007).
A second, more sophisticated approach is to inspect your learning curves to identify opportunities for improving your knowledge component (KC) model. See Stamper et al. (2011) and watch either of these two videos. Koedinger et al. (2013) describes how an improved KC model was used to redesign a tutor and describes an experiment showing that students learn faster and better from this redesigned tutor than they do from the original tutor. Koedinger & McLaughlin (2010) provides a similar result, with both showing how KC model improvements can inspire the design of novel instructional tasks.
A third, automated approach is to employ Learning Factors Analysis (LFA; see Koedinger et al., 2012). If you would like us to apply LFA to your dataset, contact us. There are many other ways researchers have improved their systems and run experiments demonstrating that these improvements work. See the topic Test an instructional principle.
If, for example, you want to test whether a power law or exponential function better fits learning data, you might use DataLab data sets to do so as follows. You might export data from a dataset, e.g. Geometry Area, 1996-1997, open it into a software package like Matlab or R, and use programs for modeling, such as generalized linear regression, to compare alternate versions of your theory. You can find instructions on how to read an exported file into R here.
Many hypotheses on learning are tested through in vivo experimentation with data stored in DataLab. Within DataLab, users can create samples on subsets of data and compare different conditions within the data. When separate samples are created for experimental conditions, selecting them all will yield learning curves for each and performance profiler data charts for each.
You can see examples of the kinds of analyses that researchers have performed by clicking on the show related datasets and papers link below and reading one of those papers. For example, MacLaren et al. (2008) show results of analyzing process data to see if experimental conditions produce different patterns of hint requests (Table 5) or produce different amounts of example study or problem solving (Table 6). One way to do such an analysis is to export the dataset from the Export tab. You may want to export one of the smaller "rollup" exports, like the student-problem rollup or the student-step rollup, which give you higher level summary data. You can open the export in your favorite tool, such as R or Excel (e.g., use a pivot table with condition in the rows, Knowledge Component in columns, and average of hints in the cells).
Error rates, times, and hints can also be viewed by condition in learning curves or the performance profiler by creating samples for each condition and selecting those. An example dataset that has condition samples is Digital Games for Improving Number Sense - Study 1 (on the Learning Curve tab, inspect the two existing samples and try turning them on and off).
The best way to test an instructional principle is to run a randomized controlled experiment with a control condition that does not employ the principle and an otherwise-identical treatment condition that does employ that principle. Many such studies have been run as illustrated in many of the associated papers listed below. One benefit of log data is that it provides information on process in addition to the outcome data present in post-tests. This data can enhance explanations of results (e.g., one potential benefit of worked examples is that students can process them faster than matched problems — do they? is it too fast?). If you are interested in a particular principle, a study may have already been done that you can use as a jumping off point.
A number of researchers have found clever ways to detect student motivational or affective states from log data. See papers by Baker and associated datasets, below. Others have run experiments comparing different instructional treatments designed to enhance student engagement or motivation (e.g., see papers by McLaren et al). Many interesting open questions remain, for example, whether timing gaps in data are indications of thoughtfulness or disengagement.
Datasets provide examples of different kinds of activities and instructional methods. Analyzing data that is related to your interests (e.g., similar content or similar technology) may give you ideas for better instructional development. Similarly, analysis of datasets may inspire research ideas. Try out exploratory data analysis techniques on a dataset, including using DataLab tools like the Performance Profiler or the Error Report as well as exporting a dataset (transaction level is most detailed) and using your favorite tool(s) for exploratory data analysis (e.g., pivot tables in Excel).