Evaluating Cognitive Models and Architectures Marc Halbrgge, - - PowerPoint PPT Presentation
Evaluating Cognitive Models and Architectures Marc Halbrgge, - - PowerPoint PPT Presentation
Evaluating Cognitive Models and Architectures Marc Halbrgge, Dipl.-Psych. marc.halbruegge@unibw.de Human Factors Institute Universitt der Bundeswehr Mnchen July 2007 Cognitive architectures in general Evaluation Methods for Cognitive
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified?
Topics of this talk
Main points Evaluation of ... Models: Current approaches to quantify model complexity are not sufficient Architectures: Cognitive architectures cannot be falsified and are therefore too weak to be considered as theories
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified?
Topics of this talk
Main points Evaluation of ... Models: Current approaches to quantify model complexity are not sufficient Architectures: Cognitive architectures cannot be falsified and are therefore too weak to be considered as theories
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified?
Contents
1
Cognitive architectures in general
2
Evaluation Methods for Cognitive Models Model Complexity and Overfitting The Markov Chain Example
3
Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified?
Cognitive Modeling
The Goal The main goal of cognitive modeling is to simulate human cognition
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Contents
1
Cognitive architectures in general
2
Evaluation Methods for Cognitive Models Model Complexity and Overfitting The Markov Chain Example
3
Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Evaluating Cognitive Models
Goodness of Fit Concordance with human data – a high fit – is generally thought to be necessary, but not sufficient for model validation [Roberts and Pashler, 2000]. Complexity and Overfitting In order to assess the fit of a model or to compare fits between models, it is important to be aware of the complexity of the models. Models with many degrees of freedom usually achieve better fits, but bear the risk of overfitting
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Evaluating Cognitive Models
Goodness of Fit Concordance with human data – a high fit – is generally thought to be necessary, but not sufficient for model validation [Roberts and Pashler, 2000]. Complexity and Overfitting In order to assess the fit of a model or to compare fits between models, it is important to be aware of the complexity of the models. Models with many degrees of freedom usually achieve better fits, but bear the risk of overfitting
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Evaluating Cognitive Models
Goodness of Fit Concordance with human data – a high fit – is generally thought to be necessary, but not sufficient for model validation [Roberts and Pashler, 2000]. Complexity and Overfitting In order to assess the fit of a model or to compare fits between models, it is important to be aware of the complexity of the models. Models with many degrees of freedom usually achieve better fits, but bear the risk of overfitting
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Overfitting
Results for the three classifiers (N=269) df misclassifications
- bs. data
true data too low 9 10 low 6 5 high 11
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Model Complexity and Overfitting
Complexity in Cognitive Modeling Baker et al. (2003): Count the number of free weights of the subsymbolic part of an ACT-R model, use this as degrees of freedom when computing the Bayesian information criterion (BIC). BUT: Measurements of complexity like the BIC aim at closed form mathematical models. Cognitive models are sequences of actions!
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Model Complexity and Overfitting
Complexity in Cognitive Modeling Baker et al. (2003): Count the number of free weights of the subsymbolic part of an ACT-R model, use this as degrees of freedom when computing the Bayesian information criterion (BIC). BUT: Measurements of complexity like the BIC aim at closed form mathematical models. Cognitive models are sequences of actions!
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Memory Retrieval in ACT-R
Two cognitive models that retrieve chunks from declarative memory. Which chunk is to be retrieved next depends on the production that currently fires. The models stop when the memory retrieval fails. Model 1 Model 2
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Addressing Complexity From the Baker et al. approach, model 2 (four states) should be considered the more complex one, because it has more numerical weights. But: Model 2 can only create sequences like abcdabcda Model 1 can output any sequence of a’s and b’s. The difference is present even if you use model run time as dependent variable. Imagine two of the four productions take a much longer time than the others.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Addressing Complexity From the Baker et al. approach, model 2 (four states) should be considered the more complex one, because it has more numerical weights. But: Model 2 can only create sequences like abcdabcda Model 1 can output any sequence of a’s and b’s. The difference is present even if you use model run time as dependent variable. Imagine two of the four productions take a much longer time than the others.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Addressing Complexity From the Baker et al. approach, model 2 (four states) should be considered the more complex one, because it has more numerical weights. But: Model 2 can only create sequences like abcdabcda Model 1 can output any sequence of a’s and b’s. The difference is present even if you use model run time as dependent variable. Imagine two of the four productions take a much longer time than the others.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Model 1 (two states)
time (s) Frequency 2 4 6 8 1000 2000 3000 4000 Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Model 2 (four states)
time (s) Frequency 2 4 6 8 1000 2000 3000 4000 5000 Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Discussion
Discussion Very simple models (about 100 lines of code) Very hard to assess the model complexity when only the number of chunks and productions is known. Knowledge of the number of influential parameters is not sufficient for the estimation of model complexity
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Discussion
Discussion Very simple models (about 100 lines of code) Very hard to assess the model complexity when only the number of chunks and productions is known. Knowledge of the number of influential parameters is not sufficient for the estimation of model complexity
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
Discussion
Discussion Very simple models (about 100 lines of code) Very hard to assess the model complexity when only the number of chunks and productions is known. Knowledge of the number of influential parameters is not sufficient for the estimation of model complexity
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? Model Complexity and Overfitting The Markov Chain Example
End of part 2
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Contents
1
Cognitive architectures in general
2
Evaluation Methods for Cognitive Models Model Complexity and Overfitting The Markov Chain Example
3
Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
The Newell Test
Newell (1990, p. 507): “play a cooperative game of anything you can do, I can do better” The Newell Test Anderson and Lebiere (2003) proposed a list of 12 criteria for the evaluation of cognitive architectures, i.e. flexible behavior, natural language, learning and evolution. An architecture is considered valid if it meets all these criteria.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
The Newell Test
Newell (1990, p. 507): “play a cooperative game of anything you can do, I can do better” The Newell Test Anderson and Lebiere (2003) proposed a list of 12 criteria for the evaluation of cognitive architectures, i.e. flexible behavior, natural language, learning and evolution. An architecture is considered valid if it meets all these criteria.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
The Newell Test
The Newell Test Although Anderson and Lebiere make some important points, they miss the problem of falsifiability. Before we ask the question “How do we validate a cognitive architecture?” we should answer the question “Can we validate a cognitive architecture at all?”
- r, according to Popper,
“Can we falsify a cognitive architecture?”
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
The Newell Test
The Newell Test Although Anderson and Lebiere make some important points, they miss the problem of falsifiability. Before we ask the question “How do we validate a cognitive architecture?” we should answer the question “Can we validate a cognitive architecture at all?”
- r, according to Popper,
“Can we falsify a cognitive architecture?”
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
The Newell Test
The Newell Test Although Anderson and Lebiere make some important points, they miss the problem of falsifiability. Before we ask the question “How do we validate a cognitive architecture?” we should answer the question “Can we validate a cognitive architecture at all?”
- r, according to Popper,
“Can we falsify a cognitive architecture?”
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Can we falsify a Cognitive Architecture?
We use ACT-R as example of an architecture. The argumentation should be applicable to any other architecture as well. ACT-R has Turing power ACT-R is a Turing machine Therefore anything (that is computable) can be done with ACT-R ACT-R can not be falsified
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Can we falsify a Cognitive Architecture?
We use ACT-R as example of an architecture. The argumentation should be applicable to any other architecture as well. ACT-R has Turing power ACT-R is a Turing machine Therefore anything (that is computable) can be done with ACT-R ACT-R can not be falsified
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Can we falsify a Cognitive Architecture?
We use ACT-R as example of an architecture. The argumentation should be applicable to any other architecture as well. ACT-R has Turing power ACT-R is a Turing machine Therefore anything (that is computable) can be done with ACT-R ACT-R can not be falsified
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Can we falsify a Cognitive Architecture?
We use ACT-R as example of an architecture. The argumentation should be applicable to any other architecture as well. ACT-R has Turing power ACT-R is a Turing machine Therefore anything (that is computable) can be done with ACT-R ACT-R can not be falsified
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
What can be done (with cognitive architectures)
Recursive programming with ACT-R 6 In order to show that ACT-R can do things easily that are very hard or even unfeasible for humans, we created a model that performs a quicksort on arbitrary lists.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
The Quicksort algorithm
function quicksort(q) var list below, pivotList, above if length(q) <= 1 return q select a pivot value pivot from q for each x in q if x < pivot then add x to below if x = pivot then add x to pivotList if x > pivot then add x to above return concatenate(quicksort(below), pivotList, quicksort(above))
http://en.wikipedia.org/wiki/Quicksort Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Recursion in ACT-R
The idea The goal chunk type can be both an element of a data list or of the goal list that emulates the goal stack. (chunk-type element data prev root) (chunk-type (qs-goal (:include element)) state goal-root prev-goal)
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
- ●
- 20
40 60 80 10 20 30 40 50 n * log(n) time (s) Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Discussion of the quicksort example
Anderson and Lebiere (2003) state that ACT-R is not a “general computational system that can be programmed to do anything” (page 597). The quicksort example shows ACT-R as such is not limited, it can be limited from another perspective: If ACT-R was a valid CA, then what about LISP , or C?
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Conclusions
Cognitive architectures are not theories. They should be regarded as theory languages, similar to programming languages. Architectures are useful for model creation, the communication between modelers, and the combination of
- models. But when it comes to validation, the focus should
not lie on architectures but on models. In order to evaluate and compare cognitive models, we need a way to assess their complexity. As long as we lack a formal methodology to do this, we should alternatively release the code of our models.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Conclusions
Cognitive architectures are not theories. They should be regarded as theory languages, similar to programming languages. Architectures are useful for model creation, the communication between modelers, and the combination of
- models. But when it comes to validation, the focus should
not lie on architectures but on models. In order to evaluate and compare cognitive models, we need a way to assess their complexity. As long as we lack a formal methodology to do this, we should alternatively release the code of our models.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
Conclusions
Cognitive architectures are not theories. They should be regarded as theory languages, similar to programming languages. Architectures are useful for model creation, the communication between modelers, and the combination of
- models. But when it comes to validation, the focus should
not lie on architectures but on models. In order to evaluate and compare cognitive models, we need a way to assess their complexity. As long as we lack a formal methodology to do this, we should alternatively release the code of our models.
Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions
References
Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S., Lebiere, C., and Qin, Y. (2004). An integrated theory of the mind. Psychological Review, 111(4):1036–1060. Anderson, J. R. and Lebiere, C. (2003). The newell test for a theory of cognition. Behavioral and Brain Sciences, 26(5):587–601. Baker, R. S., Corbett, A. T., and Koedinger, K. R. (2003). Statistical techniques for comparing act-r models of cognitive performance. In Proceedings of the 10th Annual ACT-R Workshop. Newell, A. (1990). Unified Theories of Cognition – The William James Lectures, 1987. Harvard University Press, Cambridge, MA, USA. Popper, K. R. (1989). Logik der Forschung. Mohr, Tübingen, 9 edition. Roberts, S. and Pashler, H. (2000). How persuasive is a good fit? a comment on theory testing. Psychological Review, 107(2):358–367. Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions Marc Halbrügge Evaluating Cognitive Models and Architectures
Cognitive architectures in general Evaluation Methods for Cognitive Models Can a Cognitive Architecture be Falsified? The Quicksort Example Conclusions Marc Halbrügge Evaluating Cognitive Models and Architectures