Dynamic Stocks and Flows Model Comparison Challenge

Predicting Cognitive Performance in Open-ended Dynamic Tasks
A Modeling Comparison Challenge
Organized by: Christian Lebiere, Cleotilde Gonzalez, & Walter Warwick

Menu

:. Home .:
:. Results .:
:. Registration .:
:. The DSF Task .:
:. Method .:
:. Human Performance Data .:

:. References .:
:. FAQ .:
:. Contact Us .:

Results

We are organizing a special issue of the Journal of General Artificial Intelligence on model comparison for cognitive architectures and AGI (see the Call for Papers for more details). The purpose of this special issue is to explore the merits of a comparative approach for understanding Artificial General Intelligent (AGI) systems. We welcome submissions from those who participated in the DSF model comparison challenge as well as from those who are in a position to comment on the following general topics relevant to model comparison within the context of the DSF challenge:

• Many computational fields have seen the emergence of challenge tasks to prod the development of new techniques and measure their progress toward the goal (e.g., Robocup). What are the requirements of such challenge tasks for AGI? Should they provide independent tests of specific capacities, integrated tests of functionality, or both?

• Progress is often measured on the relative evaluation of alternatives in a common setting. But what are the constraints of such comparisons for cognitive models? Are acceptable mechanisms limited to those that are judged cognitively - or even biologically - plausible? Should the complexity of a model be taken into account? Which levels of description are acceptable? Should models aim to predict human performance in new conditions, or is suitable post hoc reproduction of known performance data sufficient?

• The methodology developed by cognitive psychology for evaluating fits of model to human data is strongly dependent upon experimental control and scales poorly to complex, open-ended tasks. Sets of criteria for evaluating cognitive architectures have been proposed, but specific instantiations on AGI-level tasks have been lacking.

• Human behavior models based on cognitive architectures are usually developed for very specific tasks and at substantial effort to the modeler. While cognitive architectures keep being refined, cumulative progress in the form of model reuse has been elusive. New mechanisms and/or practices for composing and/or generalizing models of simple tasks are required for scaling up to models suitable for general, open-ended intelligence.

• Despite their stated goal of providing an integrated theory of human intelligence, specific cognitive architectures are usually applied to a relatively narrow set of cognitive activities, often laboratory tasks. Attempts to apply cognitive architectures to openended, naturalistic environments (using virtual or robotic embodiments) have raised substantial issues about their robustness and scalability beyond laboratory environments.

Human Experiment Data

The spreadsheet below contains two experiments: the Sequence and Delay experiments.

The Sequence experiment contains three conditions. All conditions will start with 4 gallons of water, have a goal of 6, and an Environmental Outflow of ZERO, and a total of 100 trials. Base payment will be $5, and bonus performance payment will be 2.5 cents per trial, to make a total of $7.5 for 100 trials (approx 30 minutes). The three conditions are as follows:
1) Sequence=2: The Environmental Inflow function repeated a sequence of 1,5... for 100 trials.
2) Sequence=2+Noise: The Environmental Inflow function was 1+/- 1, 5+/-1, 1+/- 1, 5+/-1..... for 100 trials. The final sequence can be 0/2,4/6,0/2,4/6.... etc. The +or -1 noise was distributed 50/50 trials.
3) Sequence=4: The Environmental Inflow function repeated a sequence of 0,4,2,6... for 100 trials.

The Delay experiment contains two conditions. All conditions start with 4 gallons of water, have a goal of 6, an Environmental Outflow of ZERO, a total of 100 trials. Base payment will be $5, and bonus performance payment will be 2.5 cents per trial, to make a total of $7.5 for 100 trials (approx 30 minutes). The Environmental Inflow function was a linear increasing function that deposited water into the tank from 2 to 10 gallons over the course of 100 trials. The two treatments are:
1) Delay=2: All user inflow and outflow decisions were delayed until the trial after decisions were submitted.
2) Delay=3: All user inflow and outflow decisions were delayed until two trials after decisions were submitted.

Each participant in this data set (specified by a given ID number from 1-120) was tested in one condition of each experiment and assigned letter "a" or "b" in each each experimental condition. The overall design was counterbalanced so that not all participants received a Sequence condition first (see the conditions spreadsheet for full design). The experimental condition data set that contains ID number paired with "a," e.g., "58a," is indicative of that subject "58" being tested in that condition first, followed by the condition which then contains the ID "58b."

Important note about missing datapoints: There are 15 points of data that are blank in the Delay=3 condition data spreadsheet. This is a result of human participants overloading the program with expotentially large numbers and causing the DSF task to crash. We are unable to recover those last data points.

Sequence=2
Sequence=2+Noise
Sequence=4
Delay=2
Delay=3

Data From Participating Models

The following contains documentation on the individual output variables found in the output files below:

Subject- A numeric ID is given to each subject\model run as an identifier.
Version- BATCH = simulation runs faster and does not show the DSF interface; & NONBATCH = the simulation shows the DSF interface and runs slower.
Trial- There are 100 trials per subject run.
CurrentBonus- Current amount of bonus ($0.05) earned that trial based on if the GoalAmount had been met.
TotalBonus- Total accumulated amount of bonus earned up til the given trial, starting with a base of $5.
UserInflow- Amount (in gallons) designated by subject/model as Inflow value.
UserOutflow- Amount (in gallons) designated by subject/model as Outflow value.
EnvInflow- Amount (in gallons) of Inflow placed into the DSF tank by the program, outside of the subject/model's control.
EnvOutflow- Amount (in gallons) of Outflow taken out of the DSF tank by the program, outside of the subject/model's control.
Amount- Total amount (in gallons) of water in the DSF tank at the end of the trial after Inflows and Outflows are executed for that trial.
BackOrder- Any backorder (Inflow) amount that would enter the DSF tank in conditions with some delay in the inflow.
TotalBackOrder- Total amount (in gallons) of accumulated BackOrder up til the given trial.
ForwardDischarge- Any forward discharge (Outflow) amount that would leave the DSF tank in conditions with some delay in the outflow.
TotalForwardDischarge- Total amount (in gallons) of accumulated ForwardDischarge up til the given trial.
GoalAmount- The amount (in gallons) designated as the goal of the DSF task.
DelayUI- Amount of time delay (in number of trials) it takes for the UserInflow action to be executed (i.e., a value of 1 means the action happens the the trial the values are submitted, 2 means the values submitted will be executed the turn after entry).
DelayUO- Amount of time delay (in number of trials) it takes for the UserOutflow action to be executed (i.e., a value of 1 means the action happens the the trial the values are submitted, 2 means the values submitted will be executed the turn after entry).
GoalUpperBound- The permissible amount (in gallons) over the GoalAmount that is considered to be the acceptable upper limit of the goal range.
GoalLowerBound- The permissible amount (in gallons) under the GoalAmount that is considered to be the acceptable lower limit of the goal range.

Output files of participating models for 20 subject runs:
Model 01
Model 02
Model 03
Model 04
Model 05
Model 06
Model 07
Model 08
Model 09

Organized in part by the Dynamic Decision Making Laboratory, a part of the Social and Decision Sciences Department at Carnegie Mellon University. For updates and comments regarding this website, please email hauyuw@andrew.cmu.edu.