An introduction to computational psycholinguistics: Modeling human - - PDF document

an introduction to computational psycholinguistics
SMART_READER_LITE
LIVE PREVIEW

An introduction to computational psycholinguistics: Modeling human - - PDF document

An introduction to computational psycholinguistics: Modeling human sentence processing Shravan Vasishth University of Potsdam, Germany http://www.ling.uni-potsdam.de/ vasishth vasishth@acm.org September 2005, Bochum Background check


slide-1
SLIDE 1

An introduction to computational psycholinguistics: Modeling human sentence processing

Shravan Vasishth University of Potsdam, Germany http://www.ling.uni-potsdam.de/∼vasishth vasishth@acm.org September 2005, Bochum

“Background check”

Who among you:

  • 1. have some knowledge of behavioral-experiment design (e.g., factorial design)?
  • 2. have ever carried out behavioral experiments?
  • 3. are familiar with statistical data analysis techniques, e.g. analysis of variance, Chi-square

tests, etc.?

  • 4. have a background in programming (lisp, prolog)?
  • 5. are familiar with connectionist models (in general)?
  • 6. are familiar with probabilistic context free grammars?
  • 7. are familiar with cognitive modeling in psychology?
  • 8. have some background in parsing? know what a context-free grammar is?
  • 9. have some background in syntactic theory (any kind)?

1

slide-2
SLIDE 2

What this course is about

Think about these sentences. There’s something strange about them. (1)

  • a. The president awarded the prize died.
  • b. After the student moved the chair broke.
  • c. The friend of the shopkeeper who went to Japan on a holiday turned out to be

very rich.

  • d. The daughter of the colonel, who was standing by the window, looked up.

(2)

  • a. The man the dog saw died.
  • b. The man the dog the cat bit saw died.
  • c. The man died that the dog saw that the cat bit.

2

The problem

  • Natural language is rife with ambiguity and complexity.
  • Yet we humans (“normal” humans) have no problem processing language most of the
  • time. Sometimes, we do.
  • What is the brain doing that it can take as input a sometimes ridiculously ambiguous

sentence and, within a few hundred milliseconds, understand how it parses, its semantics and its implicatures and presuppositions? And generate a response immediately?

  • Clearly it’s doing something right. And we want to know what it is. For several reasons.
  • It’ll make you rich. DEMO: try the following sentence in the ERG online demo: The

passenger offered a free ride in compensation for his troubles has refused to accept the offer.

  • It’s also an amazingly cool problem per se: what exactly is going on in the brain when

we parse sentences?

3

slide-3
SLIDE 3

Since we don’t care about getting rich. . .

if you do, leave now before it’s too late

  • We’re going to look at attempts to answer this question from different perspectives:

cognitive psychology, artificial intelligence, maybe even linguistic theory

  • All these attempts generate predictions. And these need to be evaluated, which brings

up the issue of methods. Methods for testing predictions (experiment design), for measuring the responses, and analyzing them. So we’ll look at these briefly

  • The kinds of theories that have come up fall roughly into four baskets: discursive

(“paper-pencil”; these may be specific enough to be implementable, but not necessarily), probabilistic, connectionist, and cognitive architectures (ACT-R) Note: this is a very selective tour. If I don’t mention some work it doesn’t mean it isn’t important.

4

Some central issues in real-time sentence processing

  • Ambiguity
  • Serial versus parallel parsing
  • Incrementality and prediction
  • Complexity
  • The underlying explanation for processing difficulty

5

slide-4
SLIDE 4

Ambiguity

  • We’ve already seen some examples:

(3)

  • a. LEXICAL

John was standing by the bank.

  • b. HARD TO RECOVER GARDEN PATH

The horse raced past the barn fell. (Bever 1970)

  • c. EASY TO RECOVER GARDEN PATH

After the student moved the chair broke.

  • d. SYNTACTIC ATTACHMENT AMBIGUITY

The daughter of the colonel who was standing by the window.

  • Two generalizations regarding syntactic ambiguity resolution: minimal attachment, and

early/late closure (Frazier 1979):

6

Minimal attachment and late closure

Minimal attachment: Choose the structurally simplest analysis (fewest additional nodes) (4)

  • a. HARD TO RECOVER GARDEN PATH

The horse raced past the barn fell. (Bever 1970)

  • b. SYNTACTIC ATTACHMENT AMBIGUITY

The daughter of the colonel who was standing by the window. Late closure: Integrate current input into current constituent (when possible). (5) After the student moved the chair broke. Note that prosody—an intonational phrase boundary after moved—plays a critical

  • role. Even when we read we are using prosody silently (Fodor 2002). One piece of evidence

for this comes from relative clause attachment ambiguity.

Aside 7

slide-5
SLIDE 5

Relative clause attachment ambiguities and prosody

  • In English there is a preference for local attachment:

(6) Someone hit the maid of the actress who was on the balcony.

  • However, in Spanish (7) the preference is for non-local attachment to the first noun

criada ((Cuetos & Mitchell, 1988), among others). (7) Alguien someone peg´

  • hit

a dat la the criada maid de

  • f

la the actriz actress que who estaba was en

  • n

el the balc´

  • n

balcony ‘Someone hit the maid of the actress who was on the balcony.”

  • Some other languages that behave like Spanish include French (Mitchell, Cuetos, &

Zagar, 1990), Italian (De Vincenzi & Job, 1993), German (Hemforth, Konieczny, & Scheepers, 1994), and Dutch (Brysbaert & Mitchell, 1996).

  • Hindi (Vasishth, Agnihotri, Fern´

andez, & Bhatt, 2005) shows local attachment for postnominal RCs, but no preference for participial RCs:

Aside 8

Relative clause attachment ambiguities and prosody

(8)

  • a. kisii-ne

someone-erg usa that abhinetrii-kii actress-gen usa that naukaraanii-ko maid-acc [jo who caaye tea pii drinking rahii-thii] was maara hit ‘Somebody hit the actress’ maid who was drinking tea.’

  • b. kisii-ne

someone-erg [caaye tea pii drinking rahii] was usa that abhinetrii-kii actress-gen usa that naukaraanii-ko maid-acc maara hit ‘Somebody hit the actress’ maid who was drinking tea.’ In fact, there are at least six possible positions for relative clauses, and a production study (Vasishth, Fern´ andez, and F´ ery, in progress) finds that all possibilities were used by subjects producing semi-spontaneous RC constructions. Plus some others syntacticians would dismiss as ungrammatical, but that’s another story.

Aside 9

slide-6
SLIDE 6

Relative clause attachment ambiguities and prosody

There is evidence in these languages that subjects tend to make a prosodic break before the RC in non-local attachment languages, but not in local attachment ones. In fact, in English, one can induce nonlocal attachment with a prosodic break (see also (Fern´ andez, Bradley, & Taylor, 2005; Fern´ andez, Bradley, Igoa, & Teira, 2003)). (9) The daughter of the colonel, who was standing by the window, looked up. So, even in this tiny, tiny problem, clearly the story is more complex than explanations like Minimal Attachment and Late Closure. How to answer this question?

Aside 10

Some possible approaches

  • Give up and start work in some other area.
  • Ignore counterexamples (or pretend they are not there) and doggedly stick to your

theory.

  • Turn the observation into an explanation: “Our revised theory is that local attachment

is preferred unless prosody decides otherwise.”

Aside 11

slide-7
SLIDE 7

A saner alternative

Build a precise, working process model (ideally an implementation) that (a) can be falsified, (b) generates novel (preferably surprising) and nontrivial predictions. And test

  • them. If the model works, try to break it with new predictions. If it breaks, enhance it,

regression-test it (it should not lose any coverage it has, the empirical coverage should increase monotonically), and then generate more predictions and test them. [Even in detailed process models it’s easy to re-cast the observation as the explanation, and there are all kinds of fiddle factors like free numerical parameters. They are not the solution to all our scientific problems, but they’re better in general than discursive models.]

Aside 12

Some possible approaches: Discursive models

Why not just stick to discursive models? It turns out it’s damn hard to figure out what the predictions are of a discursive model. A phrase one often hears when one presents a counterexample to a theory: “Well, it depends on how you construe the theory.” This is an useful and productive hedge when we are in the process of theorizing, but in the end, without an implementation (on paper or on computer) we will never commit to a position, rendering the theory unfalsifiable. Moving on to the other central issues. . .

Aside 13

slide-8
SLIDE 8

Memory capacity and ambiguity resolution

The limited capacity of working memory has been a central motivating factor in the development of theories of ambiguity in sentence processing. Frazier suggested that Minimal Attachment and Late Closure is a reflex of a contrained capacity working memory system. Regarding Late Closure, she says (Frazier, 1979, 39): “It is a well-attested fact about human memory that the more structured the material to be remembered, the less burden the material will place on immediate

  • memory. Hence, by allowing incoming material to be structured immediately, Late

Closure has the effect of reducing the parser’s memory load.”

14

Memory capacity and ambiguity resolution

Similarly, regarding Minimal Attachment (Frazier, 1979, 40): “[T]he Minimal Attachment strategy not only guarantees minimal structure to be held in memory, but also minimizes rule accessing. Hence, [Minimal Attachment is also an ‘economical’ strategy] in the sense that [it reduces] the computation and memory load of the parser.”

15

slide-9
SLIDE 9

Human Memory in Cognitive Psychology

  • Experimental and theoretical work in cognitive psychology gives us very general,

modality- and task-independent characterizations of human information processing. See (Neath, 2003) for an overview of the major memory models in psychology.

  • Sentence processing research makes relatively little contact with the existing work on
  • memory. An important exception is the work of McElree (2000), (McElree & Dosher,

1993), (McElree, Foraker, & Dyer, 2003).

  • Two important ideas I will focus on in this course: activation decay and interference.

Decay and interference have long been considered to be competing explanations for forgetting (e.g. (Brown, 1958), (Peterson & Peterson, 1959), (Keppel & Underwood, 1962), (Waugh & Norman, 1965)), there is considerable evidence supporting the idea that both factors play a role in information processing (see (Anderson & Lebiere, 1998; Altmann & Schunn, 2002) and references cited there).

16

An example where memory research is underexplored in psycholinguistics

Gibson and colleagues noticed that the preference for RC attachment has the following constraint (X < Y means, X is preferred to Y as an attachee): house (NP3) < lamp (NP1) < painting (NP2) (10) The lampNP near the paintingNP of the houseNP [that was damaged in the flood] Various explanations have been offered (Hemforth and colleagues, Miyamoto and colleagues), but they all fall in the general category of finding out what the linguistic constraints are: discourse salience (Hemforth), locality and predicate proximity (Gibson). . . Here’s an alternative from cognitive psychology that has rarely been considered in the field (cf. (Lewis & Nakayama, 2001), Simon Dennis’ work), and it has nothing to do with linguistics and everything to do with serial order information:

17

slide-10
SLIDE 10

Example: Start-End Model

Henson (1998, 1999): The Start-End Model is a positional theory of order memory, whereby an item’s position is defined relative to the start and end of a sequence in memory. Recall is driven by positional cues, with the best matching token(s) being selected for recall. An important prediction of this model is that, when items need to be recalled, there is a U-shaped recall accuracy, with the highest accuracy being with items at the start and the end of a list. Open research question: can we define a process model that combines these linguistic constraints and cognitive constraints (which are independent of language), and can cover a wide range of empirical phenomena in sentence processing?

18

Serial versus parallel parsing

  • Serial: compute a single analysis, and if that fails, backtrack and compute new analysis.
  • Parallel:

– Ranked: Compute all analyses in parallel, but rank them (e.g. by likelihood). – Prune: using, e.g., beam search. – Don’t prune or rank at all—generate all possible structures and then compute a function over them (entropy reduction) to find the optimal one (Hale, 2003). It is currently controversial whether humans use serial versus parallel parsing. Read (Gibson & Pearlmutter, 2000), and (Lewis, 2000), to get a sense of the issues.

19

slide-11
SLIDE 11

Incrementality and prediction

  • Humans don’t wait to compute the parse of a sentence, they do it incrementally. This

is partly the reason that garden paths occur.

  • Not only is structure built incrementally, structure is predicted in real time, with

non-monotonic adjustments/revisions. (11) Der Mann, der einen Bart . . .

20

Complexity

What makes parsing the second sentence relative to the first relatively harder? Length? (12)

  • a. The man the dog saw died.
  • b. The man the dog the cat bit saw died.
  • c. The man died that the dog saw that the cat bit.

21

slide-12
SLIDE 12

Some central issues in real-time sentence processing

  • Ambiguity: what mechanism allows us to resolve ambiguity quickly?
  • Serial versus parallel parsing
  • Incrementality and prediction: how does it work, how does it affect processing?
  • Complexity: what makes an (unambiguous) sentence harder to comprehend?
  • The underlying explanation for processing difficulty

22

Something strange about how I illustrated the issue about ambiguity

We started with a big question: how does the brain process language? But we quickly descended into very specific issues like relative clause attachment ambiguity. Newell (1973) noticed this tendency in psychology. There was too much focus on specific issues, and the real goal was somehow lost in these tiny, never-ending controversies. Our top goal is to understand how the mind works. Towards this end, he proposed some criteria [not an exhaustive listing of criteria!!!] for cognitive theories (adaptation here is by Anderson and Lebiere (2003)).

23

slide-13
SLIDE 13

Newell’s criteria for cognitive theories

A functionally complete theory of cognition should have at least these properties:

  • 1. Behaves as an (almost) arbitrary function of the environment
  • 2. Operates in real time
  • 3. Exhibits effective adaptive behavior
  • 4. Uses vast amounts of knowledge about the environment
  • 5. Behaves robustly in the face of error, the unexpected, the unknown
  • 6. Integrates diverse knowledge
  • 7. Uses natural language
  • 8. Exhibits self-awareness and a sense of self
  • 9. Learns from the environment
  • 10. Acquires capabilities through development
  • 11. Arises through evolution
  • 12. Is realizable within the brain

24

Cognitive Architectures as unifying explanations

An example of the kind of thing that bothered Newell: A theory of short-term memory correctly predicts the correct serial position curve

  • bserved in a particular experiment, but cannot explain human behavior when processing

connected text in which serial order information is critical. So the suggestion he makes is to make your theory of X a part of a completely specified

  • system. One way of looking at his idea is to consider building a composite theory–strive

for functional completeness—and to regression-test it in the context of the main goal: understanding how the mind works. We will look into some examples of this approach in this course.

25

slide-14
SLIDE 14

We can take a break at this point if you want

26

How to test your theory?

  • 1. It’s perhaps an annoying aspect of doing science that we have to evaluate our theories.

The days of armchair evaluation are, thankfully, over, at least in sentence processing research.

  • 2. Evaluation can be annoying because it is expensive: time, money, effort, expertise.
  • 3. Let’s spend a few minutes to understand the kind of evaluation methods currently in

use.

27

slide-15
SLIDE 15

Methodology: Experimental design

Factorial designs are the most common. Example:

  • You want to evaluate the effect of type of relative clause on processing. Suppose you

want to compare subject versus object relatives. This is one factor (relative clause type) and has two levels.

  • You also want to determine if the right-branching versus center embedded versions of

these differ. This is another factor (embedding) and also has two levels.

  • The factors’ levels (here, four of them) are also called conditions.

28

Methodology: Experimental design

(13)

  • a. Der

The Vater, father, DER who den the Sohn son beschimpft, scolds, ist is normalerweise usually/normally ein an ausgesprochen extremely geduldiger patient Mann. man. ‘The father who scolds the son is normally an extremely patient man.’

  • b. Der Vater, DEN der Sohn, . . .
  • c. Der Vater ist normalerweise ein ausgesprochen geduldiger Mann, DER den

Sohn beschimpft

  • d. Der Vater ist normalerweise ein ausgesprochen geduldiger Mann, DEN der Sohn

beschimpft An important step is to decide what your dependent measure will be—this depends

  • n your research question and your hypotheses’ predictions.

29

slide-16
SLIDE 16

Experimental design: Some problems

  • Each stimulus item we create faces the danger that it has some idiosyncratic property,

nothing to do with the experiment, that will affect the dependent measure

  • We can’t show all instances of the various conditions to each subject—the responses of

a subject to the conditions cannot be fairly compared

  • The order of presentation can easily influence the response of a subject.

And, for statistical reasons, we have to make sure each subject sees all conditions. What to do?

30

Experimental design: Counterbalanced designs

In our 2 × 2 design example, let’s call the four conditions A. . . D. Each version of each sentence is distributed across four groups as shown below:

Sentence No. Group 1 Group 2 Group 3 Group 4 1 A B C D 2 B C D A 3 C D A B 4 D A B C 5 A B C D 6 B C D A 7 C D A B 8 D A B C 9 A B C D 10 B C D A 11 C D A B 12 D A B C 13 A B C D 14 B C D A 15 C D A B 16 D A B C 31

slide-17
SLIDE 17

Experimental design: Counterbalanced designs

Then, each subject is randomly assigned to one group. Sentence No. Group 1 Group 2 Group 3 Group 4 1 A B C D

  • The hope is that if sentence 1 has some idiosyncratic property independent of the

research problem, then this will be evenly distributed across the four conditions. We can check for this—if there is a statistically significant group effect, we have a problem

  • Randomization is trickier: it’ll eliminate straightforward order effects if there would

have been any, but it can introduce noise–perhaps an artefact of higher-order order effects?

32

Experimental design: Statistical data analysis

  • Once the dependent measure has been obtained by running the experiment, analysis

typically consists of looking at the main effect of each factor, and whether an interaction exists.

  • Typically there are two kinds of analyses done: by subjects and by items. The idea is

that we want to generalize from the sample of subjects to the corresponding population, and from the sample of items to the population of possible sentences.

  • The by-subjects and by-items analyses are often reported separately in terms of ANOVA

(F) values, as F1 and F2 respectively. A difference between two conditions is deemed significant if the p-value falls below 0.05.

  • The details of what this all really means requires some study; if you are interested—you

should be worried if you are not—you can work through one of many tutorials on data analysis available from the course web page.

33

slide-18
SLIDE 18

Methodology: Techniques

  • Acceptability ratings, including magnitude estimation (“offline”) DEMO
  • Self-paced reading (“online”) DEMO
  • “Got-it” tasks
  • Speeded acceptability judgements
  • Speed-accuracy tradeoff designs
  • Eyetracking
  • Event-related potentials
  • Functional Magnetic Resonance Imaging
  • Corpus-based methods: frequency of words/structures, subcategorization, category and

sense bias

34

Plan for the coming days

ACT-R, cognitive architecture inspired by Newell’s idea of functionally complete architectures. We will begin with some theory, then over the coming days build some toy models to get a good understanding of some of the key details of the architectures, and finally we’ll look at a sentence processing model built onto ACT-R. Then, we’ll start looking at connectionist architectures, do some modeling/simulations

  • urselves, and then talk about some key papers in the field. Finally, we’ll take a brief look

at probabilistic models of sentence processing, and look at one of these (Hale’s entropy reduction hypothesis) in some detail.

35

slide-19
SLIDE 19

*References Altmann, E. M., & Schunn, C. D. (2002). Integrating decay and interference: A new look at an old interaction. In Proceedings of the 24th annual meeting of the cognitive science society (p. 65-70). Hillsdale, New Jersey: Erlbaum. Anderson, J., & Lebiere, C. (2003). The Newell test for a theory of mind. Behavioral and Brain Science, 26, 587-637. Anderson, J. R., & Lebiere, C. (Eds.). (1998). The Atomic Components of Thought. Mahwah, New Jersey: Lawrence Erlbaum Associates. Brown, J. (1958). Some tests of the decay theory of immediate memory. Quarterly Journal

  • f Experimental Psychology, 10, 173–189.

Brysbaert, M., & Mitchell, D. C. (1996). Modifier attachment in sentence parsing: Evidence from dutch. Quarterly Journal of Experimental Psychology, 49A(3), 696–714. Cuetos, F., & Mitchell, D. C. (1988). Cross-linguistic differences in parsing: Restrictions

  • n the use of the Late Closure strategy in Spanish. Cognition, 30, 73–105.

De Vincenzi, M., & Job, R. (1993). Some observations on the universality of the late closure strategy. Journal of Psycholinguistic Research, 22, 189–206.

36

Fern´ andez, E. M., Bradley, D., Igoa, J. M., & Teira, C. (2003). Prosodic phrasing in the RC-attachment ambiguity: Effects of language, RC-length, and position. In Proceedings of the AMLaP (Architectures and Mechanisms of Language Processing)

  • conference. Glasgow, Scotland.

Fern´ andez, E. M., Bradley, D., & Taylor, D. (2005). Prosody and informativeness in the relative clause attachment ambiguity. (MS (submitted)) Frazier, L. (1979). On comprehending sentences: Syntactic parsing strategies. Unpublished doctoral dissertation, University of Massachusetts, Amherst. Gibson, E., & Pearlmutter, N. J. (2000). Distinguishing serial and parallel parsing. Journal

  • f Psycholinguistic Research, 29(2), 231–240.

Hale, J. (2003). The information conveyed by words in sentences. Journal of Psycholinguistic Research, 32(2), 101–123. Hemforth, B., Konieczny, L., & Scheepers, C. (1994). Principle-based and probabilistic approaches to human parsing: How universal is the human language processor. In

  • H. Trost (Ed.), Taugungband KONVENS 1994. Berlin: Springer.

Henson, R. N. A. (1998). Short-term memory for serial order: The Start-End model. Cognitive Psychology, 36, 73–137.

37

slide-20
SLIDE 20

Henson, R. N. A. (1999). Coding Position in Short-Term Memory. International Journal

  • f Psychology, 34(5/6), 403–409.

Keppel, G., & Underwood, B. J. (1962). Proactive inhibition in short-term retention of single items. Journal of Verbal Learning and Verbal Behavior, 1, 153–161. Lewis, R. L. (2000). Falsifying serial and parallel parsing models: empirical conundrums and an overlooked paradigm. Journal of Psycholinguistic Research, 29(2), 241–248. Lewis, R. L., & Nakayama, M. (2001). Syntactic and positional similarity effects in the processing of Japanese embeddings. In M. Nakayama (Ed.), Sentence Processing in East Asian Languages (pp. 85–113). Stanford, CA. McElree, B. (2000). Sentence comprehension is mediated by content-addressable memory

  • structures. Journal of Psycholinguistic Research, 29(2), 111–123.

McElree, B., & Dosher, B. A. (1993). Serial retrieval processes in the recovery of order

  • information. Journal of Experimental Psychology: General, 122(3), 291–315.

McElree, B., Foraker, S., & Dyer, L. (2003). Memory structures that subserve sentence

  • comprehension. Journal of Memory and Language, 48, 67–91.

Mitchell, D. C., Cuetos, F., & Zagar, D. (1990). Reading in different languages: Is there a universal mechanism for parsing sentences? In D. A. Balota, G. B. F. d’Arcais,

38

& K. Rayner (Eds.), Comprehension processes in reading. Hillsdale, New Jersey: Erlbaum. Neath, I. (2003). Human Memory: An introduction to research, data, and theory. Pacific Grove, CA: Brooks/Vole. Newell, A. (1973). You can’t play 20 questions with nature and win: Projective comments

  • n the papers of this symposium. In W. G. Chase (Ed.), Visual information processing

(p. 283-308). New York: Academic Press. Peterson, L. R., & Peterson, M. J. (1959). Short-term retention of individual items. Journal of Experimental Psychology, 61, 12-21. Vasishth, S., Agnihotri, R. K., Fern´ andez, E. M., & Bhatt, R. (2005). Noun modification preferences in Hindi. In Proceedings of construction of knowledge conference. Vidya Bhawan Society, Udaipur. Waugh, N. C., & Norman, D. A. (1965). Primary memory. Psychological Review, 72, 89–104.

39