Part-of-Speech Tagging for Historical English Yi Yang and Jacob - - PowerPoint PPT Presentation

part of speech tagging for historical english
SMART_READER_LITE
LIVE PREVIEW

Part-of-Speech Tagging for Historical English Yi Yang and Jacob - - PowerPoint PPT Presentation

Part-of-Speech Tagging for Historical English Yi Yang and Jacob Eisenstein Georgia Tech Digital humaniEes research How does the portrayal of men and women differ in Shakespeares plays? Whats the language use paMerns in North


slide-1
SLIDE 1

Part-of-Speech Tagging for Historical English

Yi Yang and Jacob Eisenstein

Georgia Tech

slide-2
SLIDE 2

[Muralidharan and Hearst, 2011&2012]

  • Digital humaniEes research
  • How does the portrayal of men and

women differ in Shakespeare’s plays?

  • What’s the language use paMerns in

North American slave narraEves?

slide-3
SLIDE 3

[Muralidharan and Hearst, 2011&2012]

  • NLP can help!
  • Digital humaniEes research
  • How does the portrayal of men and

women differ in Shakespeare’s plays?

  • What’s the language use paMerns in

North American slave narraEves?

slide-4
SLIDE 4

[Muralidharan and Hearst, 2011&2012]

  • NLP can help!
  • Digital humaniEes research
  • How does the portrayal of men and

women differ in Shakespeare’s plays?

  • What’s the language use paMerns in

North American slave narraEves?

  • Only if NLP works for historical texts …
slide-5
SLIDE 5

Early Modern English

Hee said nobody had said anything agt mee .

[Henry Oxinden, 1660]

slide-6
SLIDE 6

Early Modern English

Hee said nobody had said anything agt mee .

  • Spelling variaEon

He against He me

[Henry Oxinden, 1660]

slide-7
SLIDE 7

Stanford POS Tagger

Hee said nobody had said anything agt mee .

  • Spelling variaEon

Stanford:

slide-8
SLIDE 8

Stanford POS Tagger

Hee said nobody had said anything agt mee .

X X X

  • Spelling variaEon

Stanford: Gold:

slide-9
SLIDE 9

Transfer Loss for POS Tagging

5 10 15 20 25

3.0

Error rate

Modern English

[Rayson et al., 2007]

slide-10
SLIDE 10

Transfer Loss for POS Tagging

5 10 15 20 25

18.0 3.0

Error rate

Modern English Early Modern English

[Rayson et al., 2007]

slide-11
SLIDE 11

Approaches

  • Spelling normalizaEon

}

  • Map from historical spellings to

contemporary forms.

Rayson et al. (2007) Scheible et al. (2011) Bollmann (2011)

slide-12
SLIDE 12

Approaches

  • Domain adaptaEon (this work)
  • Spelling normalizaEon

}

  • Map from historical spellings to

contemporary forms.

  • Build robust NLP systems with

representaEon learning.

Rayson et al. (2007) Scheible et al. (2011) Bollmann (2011)

}

Yang & Eisenstein (2014) Yang & Eisenstein (2015)

slide-13
SLIDE 13

Spelling NormalizaEon

[VARD; Baron and Rayson, 2008]

Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me .

slide-14
SLIDE 14

Spelling NormalizaEon

  • Correct normalizaEon

[VARD; Baron and Rayson, 2008]

Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me .

X

slide-15
SLIDE 15

Spelling NormalizaEon

  • Correct normalizaEon

[VARD; Baron and Rayson, 2008]

Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me .

  • Incorrect normalizaEon

X X

against

slide-16
SLIDE 16

Spelling NormalizaEon

  • Correct normalizaEon

[VARD; Baron and Rayson, 2008]

Original: Hee said nobody had said anything agt mee . Normalized: Hee said nobody had said anything aged me .

  • Incorrect normalizaEon
  • False negaEve

X X

against

X

He

slide-17
SLIDE 17

Spelling NormalizaEon

[VARD; Baron and Rayson, 2008]

Normalized: Hee said nobody had said anything aged me .

X X X

Stanford: Gold:

slide-18
SLIDE 18

Spelling NormalizaEon

[VARD; Baron and Rayson, 2008]

Normalized: Hee said nobody had said anything aged me .

X X X

Stanford: Gold:

X X

slide-19
SLIDE 19

RepresentaEon Learning

Hee said nobody had said anything agt mee .

slide-20
SLIDE 20

RepresentaEon Learning

Hee said nobody had said anything agt mee .

slide-21
SLIDE 21

RepresentaEon Learning

Hee said nobody had said anything agt mee .

slide-22
SLIDE 22

RepresentaEon Learning

Hee said nobody had said anything agt mee . Hee said was came told …

}

He I We … said was came told …

}

IV OOV Context Context

slide-23
SLIDE 23

Model

slide-24
SLIDE 24

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

slide-25
SLIDE 25

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

slide-26
SLIDE 26

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

slide-27
SLIDE 27

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

slide-28
SLIDE 28

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

slide-29
SLIDE 29

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

slide-30
SLIDE 30

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

u2

Output embeddings Input embeddings

v1 v3 v4 p(ft|f2) ∝ exp

  • u2

>vt

slide-31
SLIDE 31

Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

u2

Output embeddings Input embeddings

v1 v3 v4 p(ft|f2) ∝ exp

  • u2

>vt

  • ` =

T

X

t6=2

log p(ft|f2)

slide-32
SLIDE 32

Word Embeddings

[word2vec; Mikolov et al., 2013]

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features hee said nobody had …

}

words

1 2 3 4 1 2 3 4

  • Word embeddings
  • Feature embeddings
slide-33
SLIDE 33

Word Embeddings

[word2vec; Mikolov et al., 2013]

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features hee said nobody had …

}

words

1 2 3 4 1 2 3 4

  • Word embeddings
  • Feature embeddings
  • Generic representaEons
slide-34
SLIDE 34

Word Embeddings

[word2vec; Mikolov et al., 2013]

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features hee said nobody had …

}

words

1 2 3 4 1 2 3 4

  • Word embeddings
  • Feature embeddings
  • Generic representaEons
  • Task-specific representaEons
slide-35
SLIDE 35

Word Embeddings

[word2vec; Mikolov et al., 2013]

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features hee said nobody had …

}

words

1 2 3 4 1 2 3 4

  • Word embeddings
  • Feature embeddings
  • Generic representaEons
  • Task-specific representaEons
  • Word co-occurrences
slide-36
SLIDE 36

Word Embeddings

[word2vec; Mikolov et al., 2013]

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features hee said nobody had …

}

words

1 2 3 4 1 2 3 4

  • Word embeddings
  • Feature embeddings
  • Generic representaEons
  • Task-specific representaEons
  • Word co-occurrences
  • Feature co-occurrences
slide-37
SLIDE 37

Learning from MulEple Domains

[FEMA; Yang and Eisenstein, 2015]

  • Previous work on unsupervised domain adaptaEon

involves in two domains.

slide-38
SLIDE 38

Learning from MulEple Domains

[FEMA; Yang and Eisenstein, 2015]

  • Previous work on unsupervised domain adaptaEon

involves in two domains.

  • Unsupervised mulE-domain adaptaEon
slide-39
SLIDE 39

Learning from MulEple Domains

[FEMA; Yang and Eisenstein, 2015]

  • Previous work on unsupervised domain adaptaEon

involves in two domains.

  • Unsupervised mulE-domain adaptaEon
slide-40
SLIDE 40

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

slide-41
SLIDE 41

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

Domain AMributes: Genre Epoch

slide-42
SLIDE 42

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

leMers 1600+ Domain AMributes: Genre Epoch

slide-43
SLIDE 43

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

leMers 1600+ Domain AMributes:

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

Genre Epoch

slide-44
SLIDE 44

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

leMers 1600+ Domain AMributes:

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

= +

(shared) (leMers)

+

(1600+)

Genre Epoch

slide-45
SLIDE 45

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

leMers 1600+ Domain AMributes:

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

= +

(shared) (leMers)

+

(1600+)

Genre Epoch

slide-46
SLIDE 46

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

(shared) (leMers)

= + +

(1600+)

slide-47
SLIDE 47

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

(shared) (leMers)

= + +

(1600+)

u2 = h(shared)

2

+ h(letters)

2

+ h(1600+)

2

= + +

slide-48
SLIDE 48

MulEple Feature Embeddings

[FEMA; Yang and Eisenstein, 2015]

Hee said nobody had said anything agt mee .

CurrWord = hee NextWord = said Prefix1 = h Suffix1 = e …

}

features

1 2 3 4

(shared) (leMers) (1600+)

u2 = h(shared)

2

+ h(letters)

2

+ h(1600+)

2

= + +

p(ft|f2) ∝ exp

  • u2

>vt

slide-49
SLIDE 49

Experiments

slide-50
SLIDE 50

Penn Corpora of Historical English

Modern BriEsh English (MBE)

1840-1914 1770-1839 1700-1769 110,000 220,000 330,000 440,000

343,024 427,424 322,255

# of tokens

Early Modern English (EME)

1640-1710 1570-1639 1500-1569 177,500 355,000 532,500 710,000

640,255 706,587 614,315

# of tokens

[Krochand Taylor, 2000; Kroch et al., 2004]

slide-51
SLIDE 51

Tagset Mappings

  • Penn Corpora of Historical English (PCHE) tagset: 83 tags
  • Penn Treebank (PTB) tagset: 45 tags

[Moon and Baldridge, 2007]

slide-52
SLIDE 52

Tagset Mappings

  • Penn Corpora of Historical English (PCHE) tagset: 83 tags
  • Penn Treebank (PTB) tagset: 45 tags

[Moon and Baldridge, 2007]

ADJ PCHE PTB JJ ADV ALSO RB VB VB VBI … …

slide-53
SLIDE 53

Systems

  • Support vector machine (SVM) tagger
  • Sixteen basic feature templates by Ratnaparkhi (1996)
slide-54
SLIDE 54

Systems

  • Support vector machine (SVM) tagger
  • RepresentaEon learning methods
  • Sixteen basic feature templates by Ratnaparkhi (1996)
  • Structural correspondence learning (SCL)
  • Brown clustering
  • word2vec embeddings
  • MulEple feature embeddings (FEMA)

[Blitzer et al., 2006; Brown et al., 1992; Mikolov et al.,2013]

slide-55
SLIDE 55

Temporal AdaptaEon

Modern BriEsh English (MBE)

1840-1914 1770-1839 1700-1769 110,000 220,000 330,000 440,000

343,024 427,424 322,255

# of tokens

Early Modern English (EME)

1640-1710 1570-1639 1500-1569 177,500 355,000 532,500 710,000

640,255 706,587 614,315

# of tokens

Train Train Test 1 Test 1 Test 2 Test 2

slide-56
SLIDE 56

1.2 2.4 3.6 4.8 6

4.6

Average error rate Baseline SCL Brown word2vec FEMA

Results: Modern BriEsh English

slide-57
SLIDE 57

1.2 2.4 3.6 4.8 6

4.4 4.2 4.3 4.6

Average error rate Baseline SCL Brown word2vec FEMA

Results: Modern BriEsh English

slide-58
SLIDE 58

1.2 2.4 3.6 4.8 6

3.7 4.4 4.2 4.3 4.6

Average error rate Baseline SCL Brown word2vec FEMA

Results: Modern BriEsh English

(- 0.9)

(Our method)

slide-59
SLIDE 59

2.2 4.4 6.6 8.8 11

9.4

Baseline SCL Brown word2vec FEMA Average error rate

Results: Early Modern English

slide-60
SLIDE 60

2.2 4.4 6.6 8.8 11

8.3 8.0 8.2 9.4

Baseline SCL Brown word2vec FEMA Average error rate

Results: Early Modern English

slide-61
SLIDE 61

2.2 4.4 6.6 8.8 11

6.6 8.3 8.0 8.2 9.4

Baseline SCL Brown word2vec FEMA Average error rate

Results: Early Modern English

(- 2.8)

(Our method)

slide-62
SLIDE 62

AdaptaEon from PTB

Penn Treebank Modern BriEsh English Early Modern English

500,000 1,000,000 1,500,000 2,000,000

1,961,157 1,092,703 969,905

# of tokens

Train Test 1 Test 2

slide-63
SLIDE 63

AdaptaEon from PTB

Standard evaluaEon scenario for English POS tagging.

slide-64
SLIDE 64

AdaptaEon from PTB

Standard evaluaEon scenario for English POS tagging.

  • Low resource languages
  • Specific genres, styles, or epochs

Insufficient data annotaEon for historical texts.

slide-65
SLIDE 65

4.6 9.2 13.8 18.4 23

18.9

Error rate Baseline SCL Brown word2vec FEMA

Results: Modern BriEsh English

slide-66
SLIDE 66

4.6 9.2 13.8 18.4 23

18.3 18.4 18.4 18.9

Baseline SCL Brown word2vec FEMA Error rate

Results: Modern BriEsh English

slide-67
SLIDE 67

4.6 9.2 13.8 18.4 23

17.5 18.3 18.4 18.4 18.9

Baseline SCL Brown word2vec FEMA Error rate

Results: Modern BriEsh English

(- 1.4)

(Our method)

slide-68
SLIDE 68

6 12 18 24 30

25.9

Baseline SCL Brown word2vec FEMA Error rate

Results: Early Modern English

slide-69
SLIDE 69

6 12 18 24 30

24.2 24.0 24.1 25.9

Baseline SCL Brown word2vec FEMA Error rate

Results: Early Modern English

slide-70
SLIDE 70

Results: Early Modern English

6 12 18 24 30

22.1 24.2 24.0 24.1 25.9

Baseline SCL Brown word2vec FEMA Error rate

(- 3.8)

(Our method)

slide-71
SLIDE 71

6 12 18 24 30

22.1 25.9

Baseline FEMA+ VARD FEMA Error rate

NormalizaEon vs. RepresentaEon Learning

(- 3.8) (- 2.6) (- 4.9)

RepresentaEon learning

(- 3.8)

FEMA

(- 3.8)

22.1

slide-72
SLIDE 72

6 12 18 24 30

23.3 22.1 25.9

Baseline VARD FEMA+ VARD FEMA Error rate

NormalizaEon vs. RepresentaEon Learning

(- 3.8) (- 2.6) (- 4.9)

RepresentaEon learning Spelling normalizaEon

slide-73
SLIDE 73

6 12 18 24 30

21.0 23.3 22.1 25.9

Baseline VARD FEMA+ VARD FEMA Error rate

NormalizaEon vs. RepresentaEon Learning

(- 3.8) (- 2.6) (- 4.9)

RepresentaEon learning + normalizaEon RepresentaEon learning Spelling normalizaEon

slide-74
SLIDE 74

token annotaEons in PCHE annotaEons in PTB , (comma)

, (comma; 83.4%) . (period; 16.6%)

, (comma) . (period)

, (comma; 12.3%) . (period; 87.7%)

. (period) to

TO (54.6%) IN (44.3%)

TO all/any/every JJ DT

Error Analysis

  • AnnotaEon inconsistencies and tagset mismatches
slide-75
SLIDE 75

token annotaEons in PCHE annotaEons in PTB , (comma)

, (comma; 83.4%) . (period; 16.6%)

, (comma) . (period)

, (comma; 12.3%) . (period; 87.7%)

. (period) to

TO (54.6%) IN (44.3%)

TO all/any/every JJ DT

Error Analysis

  • AnnotaEon inconsistencies and tagset mismatches
slide-76
SLIDE 76

token annotaEons in PCHE annotaEons in PTB , (comma)

, (comma; 83.4%) . (period; 16.6%)

, (comma) . (period)

, (comma; 12.3%) . (period; 87.7%)

. (period) to

TO (54.6%) IN (44.3%)

TO all/any/every JJ DT

Error Analysis

  • AnnotaEon inconsistencies and tagset mismatches
slide-77
SLIDE 77

Error Analysis

token annotaEons in PCHE annotaEons in PTB , (comma)

, (comma; 83.4%) . (period; 16.6%)

, (comma) . (period)

, (comma; 12.3%) . (period; 87.7%)

. (period) to

TO (54.6%) IN (44.3%)

TO all/any/every JJ (quanEfier) DT

  • AnnotaEon inconsistencies and tagset mismatches
slide-78
SLIDE 78

Conclusions

slide-79
SLIDE 79

Conclusions

  • Feature embeddings outperform word embeddings by

exploiEng task-specific informaEon in feature templates.

slide-80
SLIDE 80

Conclusions

  • RepresentaEon learning and spelling normalizaEon are

complementary for improving tagging performance.

  • Feature embeddings outperform word embeddings by

exploiEng task-specific informaEon in feature templates.

slide-81
SLIDE 81

Conclusions

  • RepresentaEon learning and spelling normalizaEon are

complementary for improving tagging performance.

  • Tagset mismatches make it hard to evaluate modern POS

taggers for historical English.

  • Feature embeddings outperform word embeddings by

exploiEng task-specific informaEon in feature templates.