Eye-tracking Evidence for Frequency and Integration Cost Effects in - - PowerPoint PPT Presentation

eye tracking evidence for frequency and integration cost
SMART_READER_LITE
LIVE PREVIEW

Eye-tracking Evidence for Frequency and Integration Cost Effects in - - PowerPoint PPT Presentation

Eye-tracking Evidence for Frequency and Integration Cost Effects in Corpus Data Vera Demberg 1 , Frank Keller 1 and Roger Levy 2 1 School of Informatics University of Edinburgh 2 Department of Linguistics University of California, San Diego CUNY


slide-1
SLIDE 1

Eye-tracking Evidence for Frequency and Integration Cost Effects in Corpus Data

Vera Demberg1, Frank Keller1 and Roger Levy2

1School of Informatics

University of Edinburgh

2Department of Linguistics

University of California, San Diego

CUNY 2007, San Diego, CA March 31, 2007

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 1 / 19

slide-2
SLIDE 2

Introduction – Experimental approach

Advantages of experimental approach: controlled conditions established reliability and validity Drawbacks of experimental approach: sentences presented out of context constructed manually by the experimenter bias: do subjects develop special strategies when presented with the same construction many times? (even when there are fillers)

  • nly few items from any experiment

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 2 / 19

slide-3
SLIDE 3

Main objectives of this work

Use an eye-tracking corpus as complementary evidence to experimental data reading in context; sentences occur in natural context “real” language, naturally occurring text more data points (for frequent constructions) test on many different constructions but: less controlled conditions Test predictions for reading times on relative clauses from SPLT (Syntactic Prediction Locality Theory, (Gibson, 1998)) Transitional probabilities (McDonald & Shillcock, 2003) Question: Can we find well-established complexity effects in corpus data?

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 3 / 19

slide-4
SLIDE 4

Main objectives of this work

Use an eye-tracking corpus as complementary evidence to experimental data reading in context; sentences occur in natural context “real” language, naturally occurring text more data points (for frequent constructions) test on many different constructions but: less controlled conditions Test predictions for reading times on relative clauses from SPLT (Syntactic Prediction Locality Theory, (Gibson, 1998)) Transitional probabilities (McDonald & Shillcock, 2003) Question: Can we find well-established complexity effects in corpus data?

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 3 / 19

slide-5
SLIDE 5

Main objectives of this work

Use an eye-tracking corpus as complementary evidence to experimental data reading in context; sentences occur in natural context “real” language, naturally occurring text more data points (for frequent constructions) test on many different constructions but: less controlled conditions Test predictions for reading times on relative clauses from SPLT (Syntactic Prediction Locality Theory, (Gibson, 1998)) Transitional probabilities (McDonald & Shillcock, 2003) Question: Can we find well-established complexity effects in corpus data?

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 3 / 19

slide-6
SLIDE 6

Overview

1

Subject vs. Object Relative Clauses

2

Background: Theories predicting RC reading times

3

The Dundee Corpus

4

Methods: Multiple Hierarchical Linear Regression

5

Results

6

Conclusions

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 4 / 19

slide-7
SLIDE 7

Subject vs. Object Relative Clauses

Processing Difficulty and Relative Clauses

Reading times longer on object relative clauses (ORCs) than on subject relative clauses (SRCs), e.g. (King & Just, 1991; Gibson, 1998).

who attacked the senator admitted the error

SRC

100 200 300 400 500 who the senator attacked admitted the error

ORC

100 200 300 400 500

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 5 / 19

slide-8
SLIDE 8

Subject vs. Object Relative Clauses

Processing Difficulty and Relative Clauses

We compare reading times on the main verb within the relative clause.

who attacked the senator admitted the error

SRC

100 200 300 400 500 who the senator attacked admitted the error

ORC

100 200 300 400 500

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 5 / 19

slide-9
SLIDE 9

Subject vs. Object Relative Clauses

Processing Difficulty and Relative Clauses

We compare reading times in the disambiguating region, i.e. on the first word

  • f the RC where the ambiguity between SRC vs. ORC is resolved.

who attacked the senator admitted the error

SRC

100 200 300 400 500 who the senator attacked admitted the error

ORC

100 200 300 400 500

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 5 / 19

slide-10
SLIDE 10

Background: Theories predicting RC reading times

Theories for Reading Times in RCs

A number of theories have been developed that account for RC reading times: Gibson (1998); Lewis et al. (2006): Locality King & Just (1991): Storage and Role changes McDonald & Shillcock (2003): Transitional Probabilities Hale (2001); Levy (2007): Surprisal We pick out just two theories as an example here: Integration cost from SPLT and forward transitional probabilities.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 6 / 19

slide-11
SLIDE 11

Background: Theories predicting RC reading times

Syntactic Prediction Locality Theory

(Gibson, 1998, 20f) makes the following integration cost predictions for the relative clause regions: SRC: The – reporter I(0) who I(0) attacked I(0)+I(1) the I(0) senator I(0)+I(1) admitted I(3) the I(0) error. I(0)+I(1) ORC: The – reporter I(0) who I(0) the I(0) senator I(0) attacked I(1)+I(2) admitted I(3) the I(0) error. I(0)+I(1) Integration costs occur at the heads of phrases.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 7 / 19

slide-12
SLIDE 12

Background: Theories predicting RC reading times

Syntactic Prediction Locality Theory

(Gibson, 1998, 20f) makes the following integration cost predictions for the relative clause regions: SRC: The – reporter I(0) who I(0) attacked I(0)+I(1) the I(0) senator I(0)+I(1) admitted I(3) the I(0) error. I(0)+I(1) ORC: The – reporter I(0) who I(0) the I(0) senator I(0) attacked I(1)+I(2) admitted I(3) the I(0) error. I(0)+I(1) The main verb in the SRC should be read faster than in the ORC.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 7 / 19

slide-13
SLIDE 13

Background: Theories predicting RC reading times

Syntactic Prediction Locality Theory

(Gibson, 1998, 20f) makes the following integration cost predictions for the relative clause regions: SRC: The – reporter I(0) who I(0) attacked I(0)+I(1) the I(0) senator I(0)+I(1) admitted I(3) the I(0) error. I(0)+I(1) ORC: The – reporter I(0) who I(0) the I(0) senator I(0) attacked I(1)+I(2) admitted I(3) the I(0) error. I(0)+I(1) The verb (in SRCs) is more expensive to integrate than the determiner

  • r noun (in ORCs).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 7 / 19

slide-14
SLIDE 14

Background: Theories predicting RC reading times

Transitional Probability

Alternative account: Shorter reading times are due to higher transitional probabilities (McDonald & Shillcock, 2003). Claim: P(wn|wn−1) is predictive of reading times. Example: verb region: P(attacked | who) > P(attacked | senator)

  • disambig. region:

P(the | who) > P(attacked | who) These probabilities can be estimated from large corpora; we used the British National Corpus (BNC, 100-million-word collection).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 8 / 19

slide-15
SLIDE 15

The Dundee Corpus

The Dundee Corpus

Dundee eye-tracking corpus (Kennedy et al., 2003)

  • ca. 51.000 words of British newspaper articles (The Independent)

10 subjects parsed automatically with Charniak parser (Charniak, 2000) recall: 96%, precision: 92% for detecting RCs on WSJ Frequency of relative clause types in Dundee eye-tracking corpus: pronoun SRC ORC proportion of ORC that 150 18 10.7% which 86 39 31.7% who 137 4 2.8% total 373 61 14%

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 9 / 19

slide-16
SLIDE 16

The Dundee Corpus

The Dundee Corpus

Dundee eye-tracking corpus (Kennedy et al., 2003)

  • ca. 51.000 words of British newspaper articles (The Independent)

10 subjects parsed automatically with Charniak parser (Charniak, 2000) recall: 96%, precision: 92% for detecting RCs on WSJ Frequency of relative clause types in Dundee eye-tracking corpus: pronoun SRC ORC proportion of ORC that 150 18 10.7% which 86 39 31.7% who 137 4 2.8% total 373 61 14%

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 9 / 19

slide-17
SLIDE 17

The Dundee Corpus

Some Example RCs from the Corpus

SRCs: ...titles that seem to stretch the definition a little... ...bag searches that make you wonder whether you’ve come to an underground military center... ...the bodies that deal with the human detritus... ORCs: ...services that people need or want from computers... ...this no-holds-barren approach to sex and its consequences that many people still associate with the original Cosmo... ...answer – that few of us remained with one employer for our working lives...

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 10 / 19

slide-18
SLIDE 18

The Dundee Corpus

Some Example RCs from the Corpus

SRCs: ...titles that seem to stretch the definition a little... ...bag searches that make you wonder whether you’ve come to an underground military center... ...the bodies that deal with the human detritus... ORCs: ...services that people need or want from computers... ...this no-holds-barren approach to sex and its consequences that many people still associate with the original Cosmo... ...answer – that few of us remained with one employer for our working lives... (parsing error)

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 10 / 19

slide-19
SLIDE 19

The Dundee Corpus

Data Selection

434 RCs × 10 subjects = 4340 data points We excluded all data points where the critical region was the first or last word of a line where the critical region was preceded or followed by a punctuation mark within a region of 4 adjacent words that had not been fixated (tracking error) that contained contractions (e.g. that’ll, who’d) This left us with approximately 3000 data points. Analyses were only conducted on the fixated data points:

  • approx. 1900 for first fixation times
  • approx. 2200 for total durations

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 11 / 19

slide-20
SLIDE 20

Methods: Multiple Hierarchical Linear Regression

Multiple hierarchical linear regression

Since we don’t closely control the context, we need to regress out possibly confounding factors. Independent variables:

target factors:

RC type log transitional prob.

confounding factors:

relative pronoun word length log word freq. word’s POS tag fixation landing position

Dependent variables:

first fixation duration gaze duration total reading time

Random variable:

subject ID

We entered all variables and their interactions first and stepwise removed those that decreased model quality (according to AIC).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 12 / 19

slide-21
SLIDE 21

Methods: Multiple Hierarchical Linear Regression

Multiple hierarchical linear regression

Since we don’t closely control the context, we need to regress out possibly confounding factors. Independent variables:

target factors:

RC type log transitional prob.

confounding factors:

relative pronoun word length log word freq. word’s POS tag fixation landing position

Dependent variables:

first fixation duration gaze duration total reading time

Random variable:

subject ID

We entered all variables and their interactions first and stepwise removed those that decreased model quality (according to AIC).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 12 / 19

slide-22
SLIDE 22

Methods: Multiple Hierarchical Linear Regression

Multiple hierarchical linear regression

Since we don’t closely control the context, we need to regress out possibly confounding factors. Independent variables:

target factors:

RC type log transitional prob.

confounding factors:

relative pronoun word length log word freq. word’s POS tag fixation landing position

Dependent variables:

first fixation duration gaze duration total reading time

Random variable:

subject ID

We entered all variables and their interactions first and stepwise removed those that decreased model quality (according to AIC).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 12 / 19

slide-23
SLIDE 23

Methods: Multiple Hierarchical Linear Regression

Multiple hierarchical linear regression

Since we don’t closely control the context, we need to regress out possibly confounding factors. Independent variables:

target factors:

RC type log transitional prob.

confounding factors:

relative pronoun word length log word freq. word’s POS tag fixation landing position

Dependent variables:

first fixation duration gaze duration total reading time

Random variable:

subject ID

We entered all variables and their interactions first and stepwise removed those that decreased model quality (according to AIC).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 12 / 19

slide-24
SLIDE 24

Methods: Multiple Hierarchical Linear Regression

Multiple hierarchical linear regression

Since we don’t closely control the context, we need to regress out possibly confounding factors. Independent variables:

target factors:

RC type log transitional prob.

confounding factors:

relative pronoun word length log word freq. word’s POS tag fixation landing position

Dependent variables:

first fixation duration gaze duration total reading time

Random variable:

subject ID

We entered all variables and their interactions first and stepwise removed those that decreased model quality (according to AIC).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 12 / 19

slide-25
SLIDE 25

Methods: Multiple Hierarchical Linear Regression

Methods for Linear Regression

all data points are entered directly averaging over items or subjects not necessary due to use of a more powerful regression method standard approach (Lorch & Myers, 1990):

separate regression for each subject t-test over coefficients

we used hierarchical linear regression (Richter, 2006):

account for variance that is due to subjects on a first “level” the coefficients for the other independent variables are estimated in the second level aka linear mixed effect models

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 13 / 19

slide-26
SLIDE 26

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Total reading times:

Predictor Coeff. Sign. (Intercept) 263.42 *** RC type(SRC)

  • 177.04

*** Log transitional prob

  • 24.73

*** Length 21.47 *** Log frequency

  • 11.66

** Word landing position 6.39 Length:landing position

  • 2.94

***

  • Log. freq:length

2.65 *** RC type(SRC):log. freq 18.65 *** **p < 0.01, ***p < 0.001; R2 = 15.6%

Verbs read faster in SRC condition (as predicted by SPLT). Significant effect of transitional probability in addition to RC type effect.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 14 / 19

slide-27
SLIDE 27

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Total reading times:

Predictor Coeff. Sign. (Intercept) 263.42 *** RC type(SRC)

  • 177.04

*** Log transitional prob

  • 24.73

*** Length 21.47 *** Log frequency

  • 11.66

** Word landing position 6.39 Length:landing position

  • 2.94

***

  • Log. freq:length

2.65 *** RC type(SRC):log. freq 18.65 *** **p < 0.01, ***p < 0.001; R2 = 15.6%

Verbs read faster in SRC condition (as predicted by SPLT). Significant effect of transitional probability in addition to RC type effect.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 14 / 19

slide-28
SLIDE 28

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Total reading times:

Predictor Coeff. Sign. (Intercept) 263.42 *** RC type(SRC)

  • 177.04

*** Log transitional prob

  • 24.73

*** Length 21.47 *** Log frequency

  • 11.66

** Word landing position 6.39 Length:landing position

  • 2.94

***

  • Log. freq:length

2.65 *** RC type(SRC):log. freq 18.65 *** **p < 0.01, ***p < 0.001; R2 = 15.6%

Verbs read faster in SRC condition (as predicted by SPLT). Significant effect of transitional probability in addition to RC type effect.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 14 / 19

slide-29
SLIDE 29

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Total reading times:

Predictor Coeff. Sign. (Intercept) 263.42 *** RC type(SRC)

  • 177.04

*** Log transitional prob

  • 24.73

*** Length 21.47 *** Log frequency

  • 11.66

** Word landing position 6.39 Length:landing position

  • 2.94

***

  • Log. freq:length

2.65 *** RC type(SRC):log. freq 18.65 *** **p < 0.01, ***p < 0.001; R2 = 15.6%

Verbs read faster in SRC condition (as predicted by SPLT). Significant effect of transitional probability in addition to RC type effect.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 14 / 19

slide-30
SLIDE 30

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Total reading times:

Predictor Coeff. Sign. (Intercept) 263.42 *** RC type(SRC)

  • 177.04

*** Log transitional prob

  • 24.73

*** Length 21.47 *** Log frequency

  • 11.66

** Word landing position 6.39 Length:landing position

  • 2.94

***

  • Log. freq:length

2.65 *** RC type(SRC):log. freq 18.65 *** **p < 0.01, ***p < 0.001; R2 = 15.6%

Verbs read faster in SRC condition (as predicted by SPLT). Significant effect of transitional probability in addition to RC type effect.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 14 / 19

slide-31
SLIDE 31

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

Total reading times:

Predictor Coeff. Sign. (Intercept) 263.42 *** RC type(SRC)

  • 177.04

*** Log transitional prob

  • 24.73

*** Length 21.47 *** Log frequency

  • 11.66

** Word landing position 6.39 Length:landing position

  • 2.94

***

  • Log. freq:length

2.65 *** RC type(SRC):log. freq 18.65 *** **p < 0.01, ***p < 0.001; R2 = 15.6%

Verbs read faster in SRC condition (as predicted by SPLT). Significant effect of transitional probability in addition to RC type effect.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 14 / 19

slide-32
SLIDE 32

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

First pass times:

Predictor Coeff. Sign. (Intercept) 216.1205141 *** RC type(SRC)

  • 42.8087717

* Length 7.6596253 ** Log frequency

  • 2.7113107

Log freq:length

  • 0.8476891

** RC type(SRC):log freq 5.3769450 ** *p < 0.05, **p < 0.01, ***p < 0.001; R2 = 9.9.%

RC type effect essentially identical to total reading times no effect of transitional probability got equivalent results for first fixations

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 15 / 19

slide-33
SLIDE 33

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

First pass times:

Predictor Coeff. Sign. (Intercept) 216.1205141 *** RC type(SRC)

  • 42.8087717

* Length 7.6596253 ** Log frequency

  • 2.7113107

Log freq:length

  • 0.8476891

** RC type(SRC):log freq 5.3769450 ** *p < 0.05, **p < 0.01, ***p < 0.001; R2 = 9.9.%

RC type effect essentially identical to total reading times no effect of transitional probability got equivalent results for first fixations

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 15 / 19

slide-34
SLIDE 34

Results

Results – Main RC Verb

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

First pass times:

Predictor Coeff. Sign. (Intercept) 216.1205141 *** RC type(SRC)

  • 42.8087717

* Length 7.6596253 ** Log frequency

  • 2.7113107

Log freq:length

  • 0.8476891

** RC type(SRC):log freq 5.3769450 ** *p < 0.05, **p < 0.01, ***p < 0.001; R2 = 9.9.%

RC type effect essentially identical to total reading times no effect of transitional probability got equivalent results for first fixations

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 15 / 19

slide-35
SLIDE 35

Results

Results – Disambiguating Region

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error. Total reading times:

Predictor Coeff. Sign. (Intercept)

  • 205.8891

RC type(SRC) 393.1053 ** Transitional prob

  • 44.7011

*** Landing pos 9.8672 * Logarithmic frequency 22.0477 ** Length 28.4211 *** simplePOS-VP

  • 31.6457

* type(SRC):Trans.prob 43.4744 ** type(SRC):Log.freq

  • 20.2642

* Log.freq:Length

  • 1.3892

* Landing pos:Length

  • 3.1838

*** *p < 0.05, **p < 0.01, ***p < 0.001; R2 = 10.1%

diambiguating region read faster in ORCs (consist. with SPLT) transitional probability also facilitates reading strong correlation between RC type and transitional prob (r = 0.91)

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 16 / 19

slide-36
SLIDE 36

Results

Results – Disambiguating Region

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error. Total reading times:

Predictor Coeff. Sign. (Intercept)

  • 205.8891

RC type(SRC) 393.1053 ** Transitional prob

  • 44.7011

*** Landing pos 9.8672 * Logarithmic frequency 22.0477 ** Length 28.4211 *** simplePOS-VP

  • 31.6457

* type(SRC):Trans.prob 43.4744 ** type(SRC):Log.freq

  • 20.2642

* Log.freq:Length

  • 1.3892

* Landing pos:Length

  • 3.1838

*** *p < 0.05, **p < 0.01, ***p < 0.001; R2 = 10.1%

diambiguating region read faster in ORCs (consist. with SPLT) transitional probability also facilitates reading strong correlation between RC type and transitional prob (r = 0.91)

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 16 / 19

slide-37
SLIDE 37

Results

Results – Disambiguating Region

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error. Total reading times:

Predictor Coeff. Sign. (Intercept)

  • 205.8891

RC type(SRC) 393.1053 ** Transitional prob

  • 44.7011

*** Landing pos 9.8672 * Logarithmic frequency 22.0477 ** Length 28.4211 *** simplePOS-VP

  • 31.6457

* type(SRC):Trans.prob 43.4744 ** type(SRC):Log.freq

  • 20.2642

* Log.freq:Length

  • 1.3892

* Landing pos:Length

  • 3.1838

*** *p < 0.05, **p < 0.01, ***p < 0.001; R2 = 10.1%

diambiguating region read faster in ORCs (consist. with SPLT) transitional probability also facilitates reading strong correlation between RC type and transitional prob (r = 0.91)

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 16 / 19

slide-38
SLIDE 38

Results

Results – Disambiguating Region

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error. Total reading times:

Predictor Coeff. Sign. (Intercept)

  • 205.8891

RC type(SRC) 393.1053 ** Transitional prob

  • 44.7011

*** Landing pos 9.8672 * Logarithmic frequency 22.0477 ** Length 28.4211 *** simplePOS-VP

  • 31.6457

* type(SRC):Trans.prob 43.4744 ** type(SRC):Log.freq

  • 20.2642

* Log.freq:Length

  • 1.3892

* Landing pos:Length

  • 3.1838

*** *p < 0.05, **p < 0.01, ***p < 0.001; R2 = 10.1%

diambiguating region read faster in ORCs (consist. with SPLT) transitional probability also facilitates reading strong correlation between RC type and transitional prob (r = 0.91)

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 16 / 19

slide-39
SLIDE 39

Results

Results – Disambiguating Region

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

First fixation durations:

Predictor Coeff. Sign. (Intercept) 195.541736 *** RC type(SRC) 18.902473 *** Log frequency

  • 1.486510

** **p < 0.01, ***p < 0.001; R2 = 8.1%

Only RC type and frequency were found to be significant predictors for first fixation times. No significant effect for transitional probabilities here. The first word of the SRC (first word of VP) is read more slowly than the first word of the ORC (first word of NP).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 17 / 19

slide-40
SLIDE 40

Results

Results – Disambiguating Region

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

First fixation durations:

Predictor Coeff. Sign. (Intercept) 195.541736 *** RC type(SRC) 18.902473 *** Log frequency

  • 1.486510

** **p < 0.01, ***p < 0.001; R2 = 8.1%

Only RC type and frequency were found to be significant predictors for first fixation times. No significant effect for transitional probabilities here. The first word of the SRC (first word of VP) is read more slowly than the first word of the ORC (first word of NP).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 17 / 19

slide-41
SLIDE 41

Results

Results – Disambiguating Region

SRC: The reporter who attacked the senator admitted the error. ORC: The reporter who the senator attacked admitted the error.

First fixation durations:

Predictor Coeff. Sign. (Intercept) 195.541736 *** RC type(SRC) 18.902473 *** Log frequency

  • 1.486510

** **p < 0.01, ***p < 0.001; R2 = 8.1%

Only RC type and frequency were found to be significant predictors for first fixation times. No significant effect for transitional probabilities here. The first word of the SRC (first word of VP) is read more slowly than the first word of the ORC (first word of NP).

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 17 / 19

slide-42
SLIDE 42

Conclusions

Conclusions

New type of evidence for locality-based theories (like SPLT). Transitional probability also predicts reading times, but independent of RC type effect. The RC type effect occurs in both the late measures and the early measures, while transitional probabilities were only predictive of the late measures. Regression method allows regions to be compared when they are different words, because potentially confounding variables are regressed out. Corpus-based methodology can easily be applied for evaluating

  • ther theories and testing them on different constructions.

Corpus studies as complementary evidence to traditional experimental methods.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 18 / 19

slide-43
SLIDE 43

Conclusions

References

Gibson, E. (1998). Linguistic complexity: locality of syntactic dependencies. Cognition, 68, 1–76. Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. In Proceedings of the Second Meeting of the North American Chapter of the Asssociation for Computational Linguistics. Kennedy, A., Hill, R., & Pynte, J. (2003). The Dundee corpus. Poster at the 12th European Conference on Eye Movements, Dundee. King, J., & Just, M. A. (1991). Individual differences in syntactic processing: The role

  • f working memory. Journal of Memory and Language, 30, 580–602.

Levy, R. (2007). Expectation-based syntactic comprehension. Cognition. accepted. Lorch, R. F., & Myers, J. L. (1990). Regression analyses of repeated measures data in cognitive research. Journal of Experimental Psychology: Learning, Memory, and Cognition, 16, 149–157. McDonald, S. A., & Shillcock, R. C. (2003). Low-level predictive inference in reading: The influence of transitional probabilities on eye movements. Vision Research, 43, 1735–1751. Richter, T. (2006). What is wrong with ANOVA and multiple regression? analyzing sentence reading times with hierarchical linear models. Discourse Processes, 41, 221–250.

Vera Demberg, Frank Keller and Roger Levy (1School of Informatics University of Edinburgh

2Department of Linguistics University of

Eye-tracking Evidence in Corpus Data CUNY – March 31, 2007 19 / 19