Overview Reading aloud: orthography-phonology mapping; Cognitive - - PowerPoint PPT Presentation

overview
SMART_READER_LITE
LIVE PREVIEW

Overview Reading aloud: orthography-phonology mapping; Cognitive - - PowerPoint PPT Presentation

Overview Reading aloud: orthography-phonology mapping; Cognitive Modeling models adult performance; Lecture 13: Connectionist Networks: Models of good performance on known and unknown words; Language Processing models (normal)


slide-1
SLIDE 1

Cognitive Modeling

Lecture 13: Connectionist Networks: Models of Language Processing

Frank Keller School of Informatics University of Edinburgh

keller@inf.ed.ac.uk

Cognitive Modeling: Connectionist Models of Language Processing – p.1

Overview

Reading aloud: orthography-phonology mapping; models adult performance; good performance on known and unknown words; models (normal) human behavior; fails to replicate the double-dissociation (in dyslexia); importance of input and output representations.

Reading: McLeod et al. (1998, Ch. 8).

Cognitive Modeling: Connectionist Models of Language Processing – p.2

Reading Aloud

Task: produce the correct pronunciation for a word, given its printed form. Suited to connectionist modeling:

we need to learn mappings from one domain (print) to

another (sound);

multi-layer networks are good at this, even when

mappings are somewhat arbitrary;

human learning is similar to network learning: takes place

gradually, over time; incorrect attempts later corrected. If a network can’t model this linguistic task successfully, it would be a serious blow to connectionist modeling.

Cognitive Modeling: Connectionist Models of Language Processing – p.3

Dual Route Model

Standard model: two independent routes leading to pronunciation, be- cause:

people can pronounce words

they have never seen: SLINT or MAVE;

people

can pronounce words which break the rules: PINT or HAVE. One mechanism uses general rules for pronunciation; the other

  • ne stores pronunciation information with specific words.

Cognitive Modeling: Connectionist Models of Language Processing – p.4

slide-2
SLIDE 2

Behavior of Dual-Route Models

Consider: KINT, MINT, and PINT KINT is not a word:

no entry in the lexicon; an only be pronounced using the rule-based mechanism.

MINT is a word:

can be pronounced using the rule-based mechanism; but exists in the lexicon, so also can be pronounced by the lexical

route. PINT is a word, but irregular:

Can only be correctly pronounced by the lexical route; Otherwise, it would rhyme with MINT.

Cognitive Modeling: Connectionist Models of Language Processing – p.5

Evidence for the Dual-Route Model

Double dissociation: neuropsychological evidence shows differential behavior for two types of brain damage:

phonological dyslexia: symptom: words read without difficulty,

but pronunciations for non-words can’t be produced; explanation: damage to rule-based route; lexical route intact;

surface dyslexia: symptom: words and non-words produced

correctly, but errors on irregulars (tendency to regularize); explanation: damage to lexical route; rule-based route intact.

All dual-route models share:

a lexicon for known words, with specific pronunciation

information;

a rule mechanism for the pronunciation of unknown words.

Cognitive Modeling: Connectionist Models of Language Processing – p.6

Towards a Connectionist Model

It is unclear how a connectionist model could naturally implement a dual-route model:

no obvious way to implement a lexicon to store information

about particular words; storage is typically distributed;

no clear way to distinguish specific information from

general rules; only one uniform way to store information: connection weights. Examine the behavior of a standard 2-layer feedforward model (Seidenberg and McClelland, 1989):

trained to pronounce all the monosyllabic words of

English;

learning implemented using standard backpropagation.

Cognitive Modeling: Connectionist Models of Language Processing – p.7

Seidenberg and McClelland (1989)

2-layer feed-forward model:

distributed representations at input and

  • utput;

distributed knowledge within the net; gradient descent learning.

Input and output:

inputs are activated by the letters of the words: 20%

activated, on average;

  • utputs represent the phonological features: 12%

activated, on average;

encoding of features does not affect the success.

Processing: neti = ∑ j wi jaj +biasi; logistic activation function.

Cognitive Modeling: Connectionist Models of Language Processing – p.8

slide-3
SLIDE 3

Training the Model

Learning:

weights and bias are initially random; words are presented, and outputs are computed; connection weights are adjusted using backpropagation of error.

Training:

all monosyllabic words of 3 or more letters (about 3000); in each epoch, a sub-set was presented: frequent words

appeared more often;

  • ver 250 epochs, THE was presented 230 times, least common

word 7 times. Performance:

  • utputs were considered correct if the pattern was closer to the

correct pronunciation than that of any word;

after 250 epochs, accuracy was 97%.

Cognitive Modeling: Connectionist Models of Language Processing – p.9

Results

The model successfully learns to map most regular and irregular word forms to their correct pronunciation.

it does this without separate routes for lexical and

rule-based processing;

there is no word-specific memory

It does not perform as well as human in pronouncing non-words. Naming Latency: experiments have shown that adult reaction times for naming a word is a function of variables such as word frequency and spelling regularity.

Cognitive Modeling: Connectionist Models of Language Processing – p.10

Results

The current model cannot directly mimic latencies, since the computation of outputs is constant. The model can be seen as simulating this observation if we relate the output error score to latency.

phonological error score is the difference between the

actual pattern and the correct pattern;

hypothesis: high error should correlate with longer

latencies.

Cognitive Modeling: Connectionist Models of Language Processing – p.11

Word Frequency Effects

Experimental finding: common words are pronounced more quickly than uncommon words. Conventional (localist) explanation:

frequent words require a lower threshold of activity for the

“word recognition device” to “fire”;

infrequent words require a higher threshold of activity.

In the S&M model, naming latency is modeled by the error:

word frequency is reflected in the training procedure; phonological error is reduced by training, and therefore

lower for high frequency words. The explanation of latencies in terms of error follows directly from the network s architecture and the training regime.

Cognitive Modeling: Connectionist Models of Language Processing – p.12

slide-4
SLIDE 4

Frequency × Regularity

In addition to faster naming of frequent words, subjects exhibit:

faster pronunciation of regular words (e.g., GAVE or

MUST) than irregular words (e.g., HAVE or PINT);

but this effect interacts with frequency: it is only observed

with low frequency words. Regulars: small frequency effect; it takes slightly longer to pronounce low frequency regulars. Regulars: large frequency effect. The model mimics this pattern of behavior in the error. Dual route model: both lexical and rule outcome possible; lexical route wins for high frequency words (as it’s faster).

Cognitive Modeling: Connectionist Models of Language Processing – p.13

Frequency × Neighborhood Size

The neighborhood size of a word is defined as the number of words that differ by one letter. Neighborhood size has an effect on naming latency:

not much influence for high frequency words; low frequency words with small neighborhoods are read

more slowly than words with large neighborhoods. Shows cooperation of the information learned in response to different (but similar) inputs. Again, the connectionist model directly predicts this. Dual route model: ad hoc explanation, grouping across localist representations of the lexicon.

Cognitive Modeling: Connectionist Models of Language Processing – p.14

Experimental Data vs. Model

Frequency × regularity: Frequency × neighborhood size: regular: filled circles irregular: open squares small neighborhood: filled circles large neighborhood:

  • pen squares

Cognitive Modeling: Connectionist Models of Language Processing – p.15

Spelling-to-Sound Consistency

Consistent spelling patterns: all words have the same pronunciation, e.g., _UST Inconsistent patterns: more than one pronunciation, e.g., _AVE. Observation: adult readers produce pronunciations more quickly for non-words derived from consistent patterns (NUST) than from inconsistent patterns (MAVE). Dual route: difficult: both are pro- cessed by non-lexical route; would need to distinguish consistent and in- consistent rules. The error in the connectionist model predicts this latency effect.

Cognitive Modeling: Connectionist Models of Language Processing – p.16

slide-5
SLIDE 5

Summary: Seidenberg and McClelland

What has the model achieved:

single mechanism with no lexical entries or explicit rules; response to an input is a function of the network’s experience:

previous experience on a particular word; experience with words resembling it. Example: specific experience with HAVE is sufficient to overcome the general information that _AVE is pronounced with a long vowel. Network can produce a plausible pronunciation for MAVE; but error is introduced by experience with inconsistent words like HAVE. Performance: 97% accuracy on pronouncing learned words. Account for: interaction of frequency with regularity, neighborhood, consistency. Limitations: not as good as humans at: (a) reading non-words (model 60% correct, humans 90% correct); (b) lexical decision (FRAME is a word, but FRANE is not).

Cognitive Modeling: Connectionist Models of Language Processing – p.17

Representations are Important

Position specific representation: for inputting words of maximum length N: use N groups of 26 binary inputs. But consider: LOG, GLAD, SPLIT, GRILL, CRAWL:

model needs to learn correspondence between L and /l/; but L always appears in different positions; learning different pronunciations for different positions

should be straightforward;

alignment: letters and phonemes are not in 1-to-1

correspondence. Non-position-specific representation: loses important order information: RAT = ART = TAR.

Cognitive Modeling: Connectionist Models of Language Processing – p.18

Representations are Important

Solution: S&M decompose word and phoneme strings into triples of letters:

FISH = _FI SH_ ISH FIS (note: _ is word boundary); each input is associated with 1000 random triples; active is that triple appears in the input word.

This representation is called Wickelfeatures. S&M still suffer some specific effects: information learned about a letter in one context is not easily generalized.

Cognitive Modeling: Connectionist Models of Language Processing – p.19

Improving the Model: Plaut et al. (1996

Plaut et al.’s (1996) solution: non-position-specific representation with linguistic constraints:

monosyllabic word = onset + vowel + coda; strong constraints on order within these clusters, e.g., if t

and s are together in the onset, s always precedes t;

  • nly one set of grapheme-to-phoneme units is required for

the letters in each group;

correspondences can be pooled across different words,

even when letters appear in different positions.

Cognitive Modeling: Connectionist Models of Language Processing – p.20

slide-6
SLIDE 6

Improving the Model: Plaut et al. (1996)

Input representations:

  • nset: first letter or consonant cluster (30): y s p t k q c b d

g f v j z l m n r w h ch gh gn ph ps rh sh th ts wh;

vowel (27): e I o u a y ai au aw ay ea ee ei eu ew ey ie oa

  • e oi oo ou ow oy ue ui uy q;

coda: final letter or consonant cluster (48): h r l m n b d g

cxf v j s z p t k q bb ch ck dd dg ff gg gh gn ks ll ng nn ph pp ps rr sh sl ss tch th ts tt zz u e es ed. Monosyllabic words are spelled by choosing one or more candidates from each of the 3 possible groups: Example: THROW: (‘th’ + ‘r’), (‘o’), (‘w’).

Cognitive Modeling: Connectionist Models of Language Processing – p.21

Output Representations

Just like the input representation, the output representation is also subject to linguistic constraints: Phonology: groups of mutually exclusive members:

  • nset (23): s S C; z Z j f v T D p b t d k g m n h; l r w y q;

vowel (14): a e i o u @ ∧ A E I O U W Y q; coda (24): r; s z; l; f v p k; m n N; t; b g d; S Z T D C j; ps

ks ts. Example: /scratch/ = s k r a _ _ _ _ _ _ _ _ C.

Cognitive Modeling: Connectionist Models of Language Processing – p.22

Network Architecture and Performance

The architecture of the Plaut et al. (1996) network:

a total 105 possible orthographic

  • nsets, vowels, and codas;

61 possible phonological onsets,

vowels and codas. Performance of the Plaut et al. (1996) model:

succeeds in learning both regular and exception words; produces the frequency × regularity interaction; demonstrates the influences of frequency and

neighborhood size.

Cognitive Modeling: Connectionist Models of Language Processing – p.23

Network Architecture and Performance

What is the performance on non-words?

for consistent words (HEAN/DEAN): model (98%) versus

human (94%);

for inconsistent words (HEAF/DEAF/LEAF): model (72%),

human (78%);

this reflects production of regular forms: both human and

model produced both. Highlights the importance of encoding and how much knowledge is implicit in the coding scheme.

Cognitive Modeling: Connectionist Models of Language Processing – p.24

slide-7
SLIDE 7

Discussion

Word frequencies:

Seidenberg and McClelland (1989) presented training

materials according to the log frequencies of words;

people must deal with absolute frequencies which might

lead the model to see low frequency items too rarely;

Plaut et al. (1996) model, however, succeeds with

absolute frequencies. Doesn’t explain the double dissociation:

models surface dyslexia (can read exceptions, but not

non-words);

doesn’t model phonological dyslexia (can pronounce

non-words, but not irregulars).

Cognitive Modeling: Connectionist Models of Language Processing – p.25

Discussion

Representations:

the right encoding scheme is essential for modeling the

findings: how much linguistic knowledge is given to the network by Plaut et al.’s encoding?

they assume this knowledge could be partially acquired

prior to reading, i.e., children learn to pronounce words before they can read them;

doesn’t scale to polysyllabic words.

Cognitive Modeling: Connectionist Models of Language Processing – p.26

Summary

Dissociations in performance do not necessarily entail distinct

mechanisms;

reading aloud model: a single mechanism explains regular and

irregular pronunciation of monosyllabic words;

but: explaining double dissociations is difficult (has been shown

for small networks, but unclear if larger, more plausible, networks can demonstrate double dissociations);

connectionist models excel at finding structure and patterns in the

environment: ‘statistical inference machines’:

the start state for learning may be relatively simple,

unspecified;

necessary constraints to aid learning come from the

environment (or from the representations used).

Can such models scale up? Are they successful for languages

with different distributional properties?

Cognitive Modeling: Connectionist Models of Language Processing – p.27

References

McLeod, Peter, Kim Plunkett, and Edmund T. Rolls. 1998. Introduction to Connectionist Modelling of Cognitive Processes. Oxford: Oxford University Press. Plaut, David C., James L. McClelland, Mark S. Seidenberg, and Karalyn Patterson. 1996. Understanding normal and impaired word reading: Computational principles in quasi-regular domains. Psychological Review 103: 56–115. Seidenberg, Mark S., and James L. McClelland. 1989. A distributed developmental model of word recognition and naming. Psychological Review 96: 523–568.

Cognitive Modeling: Connectionist Models of Language Processing – p.28