Predicting Cognitive
Performance in Open-ended Dynamic Tasks | ||||||||||||||||||||
Menu
|
Results We are organizing a special issue of the Journal of General Artificial Intelligence on model comparison for cognitive architectures and AGI (see the Call for Papers for more details). The purpose of this special issue is to explore the merits of a comparative approach for understanding Artificial General Intelligent (AGI) systems. We welcome submissions from those who participated in the DSF model comparison challenge as well as from those who are in a position to comment on the following general topics relevant to model comparison within the context of the DSF challenge: • Many computational fields have seen the emergence of challenge tasks to prod the development of new techniques and measure their progress toward the goal (e.g., Robocup). What are the requirements of such challenge tasks for AGI? Should they provide independent tests of specific capacities, integrated tests of functionality, or both? • Progress is often measured on the relative evaluation of alternatives in a common setting. But what are the constraints of such comparisons for cognitive models? Are acceptable mechanisms limited to those that are judged cognitively - or even biologically - plausible? Should the complexity of a model be taken into account? Which levels of description are acceptable? Should models aim to predict human performance in new conditions, or is suitable post hoc reproduction of known performance data sufficient? • The methodology developed by cognitive psychology for evaluating fits of model to human data is strongly dependent upon experimental control and scales poorly to complex, open-ended tasks. Sets of criteria for evaluating cognitive architectures have been proposed, but specific instantiations on AGI-level tasks have been lacking. • Human behavior models based on cognitive architectures are usually developed for very specific tasks and at substantial effort to the modeler. While cognitive architectures keep being refined, cumulative progress in the form of model reuse has been elusive. New mechanisms and/or practices for composing and/or generalizing models of simple tasks are required for scaling up to models suitable for general, open-ended intelligence. • Despite their stated goal of providing an integrated theory of human intelligence, specific cognitive architectures are usually applied to a relatively narrow set of cognitive activities, often laboratory tasks. Attempts to apply cognitive architectures to openended, naturalistic environments (using virtual or robotic embodiments) have raised substantial issues about their robustness and scalability beyond laboratory environments. Human Experiment Data
The spreadsheet below contains two experiments: the Sequence and Delay experiments. The Sequence experiment contains three conditions. All conditions will
start with 4 gallons of water, have a goal of 6, and an Environmental
Outflow of ZERO, and a total of 100 trials. Base payment will be $5,
and bonus performance payment will be 2.5 cents per trial, to make a
total of $7.5 for 100 trials (approx 30 minutes). The three conditions
are as follows: The Delay experiment contains two conditions. All conditions
start with 4 gallons of water, have a goal of 6, an Environmental Outflow
of ZERO, a total of 100 trials. Base payment will be $5, and bonus performance
payment will be 2.5 cents per trial, to make a total of $7.5 for 100
trials (approx 30 minutes). The Environmental Inflow function was a
linear increasing function that deposited water into the tank from 2
to 10 gallons over the course of 100 trials. The two treatments are: Each participant in this data set (specified by a given ID number from 1-120) was tested in one condition of each experiment and assigned letter "a" or "b" in each each experimental condition. The overall design was counterbalanced so that not all participants received a Sequence condition first (see the conditions spreadsheet for full design). The experimental condition data set that contains ID number paired with "a," e.g., "58a," is indicative of that subject "58" being tested in that condition first, followed by the condition which then contains the ID "58b." Important note about missing datapoints: There are 15 points of data that are blank in the Delay=3 condition data spreadsheet. This is a result of human participants overloading the program with expotentially large numbers and causing the DSF task to crash. We are unable to recover those last data points. Sequence=2 Data From Participating Models The following contains documentation on the individual output variables found in the output files below:
Output files of participating models for 20 subject runs: |
|||||||||||||||||||