Predicting Cognitive Performance in Open-ended Dynamic Tasks
A Modeling Comparison Challenge
Organized by: Christian Lebiere, Cleotilde Gonzalez, & Walter Warwick

Menu

:. Home .:
:. Results .:
:. Registration .:
:. The DSF Task .:
:. Method .:
:. Human Performance Data .:
:. References .:
:. FAQ .:
:. Contact Us .:

Method

Here is the run-down of the overall method for this challenge:

1. Download the Dynamic Stock and Flows (DSF) task- An executable copy of the DSF is available for download to all participants. The DSF task environment requires a Windows platform but uses a TCP/IP socket protocol to communicate with external models. See the text-based socket protocol documentation.

In addition, a description of the dynamics of the task is provided in the “Human performance data for model calibration” section of this webpage. Participants are free to implement their own version of the DSF task but whatever model they develop must ultimately interact via the published TCP/IP socket protocol with our version of the DSF task.

Finally, two versions of the DSF task are available as explained in the "DSF: The Dynamic Stock & Flows Task": the DSFForSockets.zip file supports constructive connection with external models; the second one includes the GUI used by human subjects so that the participants of this challenge can experience the DSF task for themselves.

2. Create a model to interact with the task- Once participants have established a connection to the DSF environment, they can calibrate their models running against the "calibrating" protocols described in the Human Performance Data section and comparing model performance against human performance data in those conditions. In this way, participants will be able to gauge whether their models are capable of simulating the basic effects seen in human control of the DSF task.

3. Refine your model as needed- Our past experience suggests that this will lead to an iterative development process where models are continually refined as they are run under different experimental protocols and against different data sets. Modelers are free to experiment with any variation of the task that are allowed under the existing protocol and implementation, but no data will be provided for any condition besides the “calibrating” protocols.

4. Model comparison- Model comparison begins only after participants are satisfied with the performance they have achieved on the calibrating data. At that point, but no later than May 15, 2009 participants will submit executable version of their model through the website to be run against novel protocols. The DSF task supports several interesting variants, including but not limited to: different inflow and outflow functions, control delays, the addition of "noise" to the inflow and outflow amounts, and another agent controlling the environmental inflows and outflows. The choice of specific novel conditions will be entirely at our discretion and submitted models will be run under these conditions submitted.

Our goal for this blind evaluation under the novel conditions is not to hamstring participants, but to see how well their models generalize without the benefit of continual tweaking or tuning, and to test the predictiveness of the model for conditions in which no data were available. Assessing robustness under the transfer condition is an important factor to consider when we investigate the invariance of modeling approaches. Again, the transfer experimental conditions and corresponding data will not be known to modelers prior to evaluation. Their purpose is to evaluate the generality and scalability of the model to a range of conditions beyond those for which data are available in the model development and calibration process.

We will rank all participants according to a quantitative measure of goodness-of-fit to the transfer data. That said, goodness-of-fit measures under the calibrating and transfer conditions is not the only factor we will use in our comparison effort. In addition to submitting their models, we will require participants to submit written accounts of their development efforts and detailed explanations of the mechanisms their models implement. We recognize that it is difficult to explain the workings of a cognitive model in a compact and understandable manner to people who might be unfamiliar with the paradigm in which it was developed, but it is exactly that level of detail that is required to understand what has been accomplished and to judge the implications of the model’s assumptions for its ability to model the task.

The top-ranking model according to the purely quantitative goodness-of-fit criterion will automatically be invited to the symposium at ICCM. In addition, we will invite, at our discretion, another two entries to the symposium based on a mixture of quantitative fit, qualitative capture of important effects in the data, theoretical soundness and cognitive plausibility, as well as to showcase a diversity of modeling approaches.

Organized in part by the Dynamic Decision Making Laboratory, a part of the Social and Decision Sciences Department at Carnegie Mellon University. For updates and comments regarding this website, please email hauyuw@andrew.cmu.edu.