MORE THAN WORDS A DISCRIMINATIVE LEARNING MODEL WITH LEXICAL - - PowerPoint PPT Presentation

more than words
SMART_READER_LITE
LIVE PREVIEW

MORE THAN WORDS A DISCRIMINATIVE LEARNING MODEL WITH LEXICAL - - PowerPoint PPT Presentation

MORE THAN WORDS A DISCRIMINATIVE LEARNING MODEL WITH LEXICAL BUNDLES March 8th, 2017 Saskia E. Lensink, R. Harald Baayen s.e.lensink@hum.leidenuniv.nl Contents Multi-word units and their cognitive reality Experimental methods


slide-1
SLIDE 1

MORE THAN WORDS

A DISCRIMINATIVE LEARNING MODEL WITH LEXICAL BUNDLES

March 8th, 2017

Saskia E. Lensink, R. Harald Baayen s.e.lensink@hum.leidenuniv.nl

slide-2
SLIDE 2

Contents

■ Multi-word units and their cognitive reality ■ Experimental methods ■ Computational model of multi-word units ■ Eye-tracking study ■ Production study ■ Results and implications

2

slide-3
SLIDE 3

A typology of multi-word units

3

Wray (2012)

slide-4
SLIDE 4

Multi-word units

■ Indicator of nativen eness ess ■ Thought to be repres resent nted ed as a whole

  • le

■ How can we exper perime imentally ntally test t for the cognitive reality of these multi-word units?

4

slide-5
SLIDE 5

Multi-word frequencies

5

Previous studies have found an effect of frequencies

  • f regular multi-word units

suggests storage

  • rage of wholes

les

slide-6
SLIDE 6

Previous studies

■ self-paced reading Tremblay, Derwing, Libben, & Westbury, 2011 ■ phrasal decision tasks Arnon & Snider, 2010; Ellis & Simpson-Vlach, 2009 ■ priming of the last word of the ngram Ellis & Simpson-Vlach, 2009 ■ word reading tasks Arnon & Priva, 2013; Ellis & Simpson-Vlach, 2009;

Han, 2015; Tremblay & Tucker, 2011

■ picture naming Janssen & Barber, 2012 ■ sentence recall Tremblay et al., 2011 ■ immediate free recall Tremblay & Baayen, 2010 ■ eye-tracking Siyanova-Chanturia, Conklin, & Van Heuven, 2011 ■ ERPs Tremblay & Baayen 2010 ■ L1 language acquisition Bannard & Matthews, 2008 ■ L2 speakers Conklin & Schmitt, 2012; Han, 2015;

Jiang & Nekrasova, 2007; Siyanova-Chanturia et al, 2011

6

slide-7
SLIDE 7

Frequency is an impoverished measure

■ Collapses counts of homo

  • mopho

hone nes ■ Collapses counts of different rent senses nses ■ Language always occurs in context xt – prediction also plays a large role in processing ■ Salien ence ce and recen cency cy also play a role

7

slide-8
SLIDE 8

Mind the neighbors!

■ When studying words, we pay attention to – Frequency effects – Length – Neighborhood density effects ■ When studying multi-word units, we pay attention to – Frequency effects – Length – But ut not

  • t to

to neighbo ghborho hood

  • d densit

nsity effects ects!

8

slide-9
SLIDE 9

Motivation for our study

■ We know that the framework of discriminative learning has given us some new insights into language ■ A computational model implementing discriminative learning, NDL, provides us with a measure reflecting neighborhood density effects ■ When adding features of discriminative learning to our models of the processing of multi-word units, we might gain new insights into the processing of multi-word units ■ We conducted both an eye-tracking and a production study to study comprehension and production

9

slide-10
SLIDE 10

NDL

Baayen et al., 2011

■ Naïve Discriminative Learning ■ Implements Rescorla-Wagner equations that specify how experience alters the strength of association of a cue cue to a given

  • utcome

come ■ Distributional properties of corpus data used, using basic principles of error-dri driven en learn rning ing ■ Weight from cues to outcomes adjus usted ed depending on corre rect ct/inc incorre rrect ct predict iction

  • n of an outcome given a certain cue

This approach successfully predicted word frequency effects, morphological family size effects, inflectional entropy effects, and phrasal frequency effects

10

slide-11
SLIDE 11

NDL

Baayen et al., 2011

■ Outcomes are thought of as point nter ers s to locati tions

  • ns in a multi-

dimensional semanti mantic c space ce ■ These locations are const stantl antly y up updated ed by the experiences a language user has

11

slide-12
SLIDE 12

NDL with lexical bundles

12

slide-13
SLIDE 13

Weight word X

13

Bottom-up information

slide-14
SLIDE 14

Total activation trigram (act)

14

Bottom-up information

slide-15
SLIDE 15

Prior activation trigram

15

Top-down information

slide-16
SLIDE 16

Activation diversity

16

Competing trigrams – neighborhood density

slide-17
SLIDE 17

Eye-tracking experiment

■ Plaatje eye-tracker/oog oid

17

Ey Eye trac e tracking king

slide-18
SLIDE 18

Stimuli

18

■ most common n-grams (trigrams) from corpus ■ OpenSoNaR corpus ■ Use frequencies extracted from a corpus

  • f Dutch subtitles (N =

109,807,716)

slide-19
SLIDE 19

Procedure

19

■ Silent reading ■ Comprehension questions to ascertain attentive reading ■ 30 participants (10 male) ■ Analyzed using generalized additive mixed-effects models (GAMMS)

slide-20
SLIDE 20

Modeling data

■ See if and to what extent NDL measures gives us more insights over and above more traditional frequency measures ■ Some frequency and NDL measures show high amount of colline ineari rity ty – e.g. ‘freqABC’ and ‘prior’ ■ Models with just frequencies performed worse than models with both frequencies and NDL measures ■ Neighborhood density effects are best reflected by the Activation Diversity measure, which was a significant predictor in several models

20

slide-21
SLIDE 21

First fixation durations

21

ActDivTrigram FreqABC FreqC ActDivTrigram firstFixX FreqABC firstFixX

firstFixX

slide-22
SLIDE 22

Second fixation durations

22

secondFixX length prior Weight word 3

slide-23
SLIDE 23

Number of fixations

23

secondFixX firstFixX

slide-24
SLIDE 24

Discussion eye-tracking data

■ Already in the first fixation effects of the trigram frequencies and third word ■ Processes of top down n infor

  • rmat

mation

  • n (freq

equenc ency effects ects), bott

  • ttom
  • m-up

up informati

  • rmation
  • n (acti

ctivations ations) ) and uncer certainty tainty reduc uction tion (activ tivation ation di diversi ersity ty/nei neighbor ghborhood hood effects ects) ■ Knowled wledge ge verif rificati cation

  • n (freq

equenci uencies es): a reader spends more time in early measures with higher frequencies and if enough information is available – if not, a new fixation is planned asap ■ Bott

  • ttom
  • m-up

up informatio

  • rmation (w3):

3): when further into the trigram at your second fixation, it pays to spend more time to resolve things locally if the third word provides a lot of support for the trigram. If not, participants are faster to refixate ■ uncer ertainty tainty reduct uction

  • n (nei

eigh ghbor borho hood

  • d densi

nsity) y): if there are many competing trigrams, shorter looking times in first fixations and a higher number of fixations.

24

slide-25
SLIDE 25

General discussion

■ Multi-word units are relevant ant un unit of storage age (also in Dutch) ■ Both single le words ds and the ful ull trigram ram play a role ■ Adding measures from a discrimina criminativ tive mode del provides us with new w insight ights into the processing of MWUs ■ Considering neigh ghbor borhoo

  • od

d densi ensity ty effec ects ts provides us with more insights into the workings of MWU processing ■ In processing of multi-word units, opposing forces of top-do down n inform

  • rmati

tion

  • n, bott
  • ttom
  • m-up

up informa

  • rmati

tion

  • n and un

uncer ertainty tainty reduc ducti tion

  • n

are at work

25

slide-26
SLIDE 26

26

Qu Questions? estions?

slide-27
SLIDE 27

Extra slides – production

27

slide-28
SLIDE 28

Production experiments

28

slide-29
SLIDE 29

Procedure

29

■ Same stimuli as used in the eye-tracking study ■ Word reading task ■ 30 participants (8 male) ■ Onsets and durations measured using Praat ■ Analyzed using generalized additive mixed effect models (GAMMs)

slide-30
SLIDE 30

Production onsets

30

slide-31
SLIDE 31

Production durations

31

slide-32
SLIDE 32

A trade-off

32

naming latencies durations

slide-33
SLIDE 33

Discussion production data

■ Processes of top down n informa mation

  • n (frequen

ency cy effects ts), bot

  • ttom
  • m-

up informati mation

  • n (acti

tivat ations

  • ns)

) and unc ncertainty tainty reduct ction ion (activat ation ion diversity ity/nei neighb ghbor

  • rhood
  • od effects)

■ There is a trade ade-off between starting early and being able to pronounce the trigram fast ■ Top-down wn informati mation

  • n slows you down at first, but makes total

durat ration

  • ns shorter

er (longer to plan, but easier motor program to execute) ■ Bott

  • ttom-up

up informa rmation tion gives you a quick ck start but slows you down later (shorter to plan, but harder motor program to execute) ■ Neighb hbor

  • rhood
  • od effects apparent in produc

ducti tion

  • n durat

ration

  • ns – longer

durations when the number of neighbors is different from the average (less motor practice)

33