Aggregation Model for English Derivational Morphology Daniel - - PowerPoint PPT Presentation

aggregation model for english
SMART_READER_LITE
LIVE PREVIEW

Aggregation Model for English Derivational Morphology Daniel - - PowerPoint PPT Presentation

A Distributional and Orthographic Aggregation Model for English Derivational Morphology Daniel Deutsch,* John Hewitt,* and Dan Roth *equal contribution Co-Authors John Hewitt Dan Roth Co-First Author Advisor 2 Derivational Morphology


slide-1
SLIDE 1

A Distributional and Orthographic Aggregation Model for English Derivational Morphology

Daniel Deutsch,* John Hewitt,* and Dan Roth

*equal contribution

slide-2
SLIDE 2

2

Co-Authors

John Hewitt

Co-First Author

Dan Roth

Advisor

slide-3
SLIDE 3

3

Derivational Morphology

employer employ employment intensely intense intensity

slide-4
SLIDE 4

4

Derivational Morphology

employer employ employment intensely intense intensity

root word

transformation derived word

slide-5
SLIDE 5

5

Derivational Morphology

employer employ employment intensely intense intensity

root word

transformation derived word

slide-6
SLIDE 6

6

Motivation

  • Machine translation
  • Text simplification
  • Language generation
slide-7
SLIDE 7

7

Challenges

  • Suffix ambiguity
  • Orthographic irregularity
slide-8
SLIDE 8

8

Suffix Ambiguity

“I have an observament!”

slide-9
SLIDE 9

9

Suffix Ambiguity

ground grounding *groundation *groundment *groundal “I have an observament!”

Result

slide-10
SLIDE 10

10

Suffix Ambiguity

valid validity *validness

Nominal

ground grounding *groundation *groundment *groundal “I have an observament!”

Result

slide-11
SLIDE 11

11

Orthographic Irregularity

speak speech

Result

slide-12
SLIDE 12

12

Orthographic Irregularity

speak creak speech

Result Result

slide-13
SLIDE 13

13

Orthographic Irregularity

speak creak speech *creech

Result Result

slide-14
SLIDE 14

*creech

14

Orthographic Irregularity

speak creak speech creaking

Result Result

slide-15
SLIDE 15

*creech

15

Orthographic Irregularity

speak creak erupt speech creaking eruption

Result Result Result

slide-16
SLIDE 16

*creech

16

Orthographic Irregularity

speak creak erupt bankrupt speech creaking eruption

Result Result Result Result

slide-17
SLIDE 17

*creech

17

Orthographic Irregularity

speak creak erupt bankrupt speech creaking eruption *bankruption

Result Result Result Result

slide-18
SLIDE 18

18

Orthographic Irregularity

speak speech creak *creech erupt eruption bankrupt *bankruption bankruptcy creaking

Result Result Result Result

slide-19
SLIDE 19

19

Model Overview

wise

Adverb +

wisely aggregation distributional:

  • rthographic irregularity
  • rthographic:

suffix ambiguity

slide-20
SLIDE 20

20

Model Overview

wise

Adverb +

slide-21
SLIDE 21

21

Model Overview

distributional:

  • rthographic irregularity
  • rthographic:

suffix ambiguity

slide-22
SLIDE 22

22

Model Overview

wisely aggregation

slide-23
SLIDE 23

23

Model Overview

wise

Adverb +

wisely aggregation distributional:

  • rthographic irregularity
  • rthographic:

suffix ambiguity

slide-24
SLIDE 24

24

Model Overview

  • rthographic:

suffix ambiguity

slide-25
SLIDE 25

25

Orthographic Model

  • Seq2Seq baseline
  • Dictionary-constrained decoding
  • Reranking with frequency information
slide-26
SLIDE 26

26

Seq2Seq Baseline

c

  • m

p

  • s

e #

Result

# c

  • m

p

  • s

i

slide-27
SLIDE 27

27

Seq2Seq Baseline

c

  • m

p

  • s

e # #

Result

slide-28
SLIDE 28

28

Seq2Seq Baseline

c

  • m

p

  • s

e # #

slide-29
SLIDE 29

29

Seq2Seq Baseline

Result

slide-30
SLIDE 30

30

Seq2Seq Baseline

c

  • m

p

  • s

i

slide-31
SLIDE 31
  • Seq2Seq models generate

many unattested words, but are reasonable guesses

31

Dictionary-Constrained Decoding

ground grounding *groundation *groundment *groundal

Result

Suffix Ambiguity

slide-32
SLIDE 32
  • Seq2Seq models generate

many unattested words, but are reasonable guesses

  • Intuition: constrain model

to only generate known words

32

Dictionary-Constrained Decoding

ground grounding *groundation *groundment *groundal

Result

Suffix Ambiguity

slide-33
SLIDE 33

33

Dictionary-Constrained Decoding

slide-34
SLIDE 34

34

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba … … … … … … #

slide-35
SLIDE 35

35

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba #

slide-36
SLIDE 36

36

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba … … … … … … #

slide-37
SLIDE 37

37

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba #

slide-38
SLIDE 38

38

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba … #

slide-39
SLIDE 39

39

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba … #

slide-40
SLIDE 40

40

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba … … … … … … #

slide-41
SLIDE 41

41

Dictionary-Constrained Decoding

# a b ab aa bb aba abb baa bab ba … … … … … … #

slide-42
SLIDE 42

42

Dictionary-Constrained Decoding

Search over trie induced from dictionary

# a b ab aba ba #

slide-43
SLIDE 43

43

Reranking with Frequency Information

refute

Result

slide-44
SLIDE 44

44

Reranking with Frequency Information

refution refutation refut refuty refutat

  • 1.1
  • 1.2
  • 4.8
  • 5.6
  • 8.7

refute

Result

Model Output Model Score

slide-45
SLIDE 45

45

Reranking with Frequency Information

refution refutation refut refuty refutat

  • 1.1
  • 1.2
  • 4.8
  • 5.6
  • 8.7

refute

Result

Model Output Model Score

slide-46
SLIDE 46

46

Reranking with Frequency Information

refution refutation refut refuty refutat

  • 1.1
  • 1.2
  • 4.8
  • 5.6
  • 8.7

5.0 14.3 7.4 0.1 8.6

refute

Result

Model Output Model Score Log Corpus Freq

slide-47
SLIDE 47

47

Reranking with Frequency Information

Model Output

refution refutation refut refuty refutat

Model Score

  • 1.1
  • 1.2
  • 4.8
  • 5.6
  • 8.7

Log Corpus Freq

5.0 14.3 7.4 0.1 8.6

Reranker Output

refutation refution refut refuty refutat

Reranker Score

0.5

  • 0.9
  • 0.9
  • 0.9
  • 0.9

refute

Result

slide-48
SLIDE 48

48

Model Overview

  • rthographic:

suffix ambiguity

slide-49
SLIDE 49

49

Model Overview

distributional:

  • rthographic irregularity
slide-50
SLIDE 50
  • Orthographic information

can be unreliable

  • Semantic transformation

remains the same

50

Distributional Model

*creech speak creak speech creaking Orthographic Irregularity

Result Result

slide-51
SLIDE 51

51

Distributional Model

Intuition

slide-52
SLIDE 52

52

Distributional Model

Intuition

slide-53
SLIDE 53

53

Distributional Model

Intuition Learn non-linear function per transformation

slide-54
SLIDE 54

54

Distributional Model

Intuition Learn non-linear function per transformation Independent of

  • rthography
slide-55
SLIDE 55

55

Distributional Model

non-linear function

slide-56
SLIDE 56

56

Model Overview

distributional:

  • rthographic irregularity
slide-57
SLIDE 57

57

Model Overview

wisely aggregation

slide-58
SLIDE 58

58

Aggregation Model

approval

Orthographic

approvation

  • 0.2

Distributional

approval

  • 0.1

non-linear function

slide-59
SLIDE 59

59

Aggregation Model

Ortho Score Score Distributional

  • 0.9
  • 0.3
  • 0.5
  • 0.8
  • 0.6
  • 0.8
  • 1.1
  • 0.9

approvation bankruption expertly stroller approval bankruptcy expertly strolls

slide-60
SLIDE 60

60

Aggregation Model

Ortho Score Score Distributional

  • 0.9
  • 0.3
  • 0.8
  • 0.6
  • 0.8
  • 0.9

approvation bankruption stroller approval bankruptcy strolls

slide-61
SLIDE 61

61

Aggregation Model

Ortho Score Score Aggregation Selection

approvation bankruption expertly stroller

Distributional

approval bankruptcy expertly strolls

  • 0.9
  • 0.3
  • 0.5
  • 0.8
  • 0.6
  • 0.8
  • 1.1
  • 0.9

approval bankruption expertly stroller

slide-62
SLIDE 62

Experiments

62

slide-63
SLIDE 63

Dataset

Transformation Count Example

Adverb

1715

Result

1251

Agent

801

Nominal

354 recite recital

  • verstate
  • verstatement

simulate simulation wise wisely survive survivor yodel yodeler effective effectiveness pessimistic pessimism intense intensity

Cotterell et al. 2017

63

slide-64
SLIDE 64

64

Experiment Details

  • 30 random restarts
  • Token information: Google Book NGrams

– 360k unigram types – Token counts aggregated

  • Google News pre-trained word embeddings
  • Evaluation: full-token match accuracy
slide-65
SLIDE 65

65

Results Legend

Seq2Seq Distributional Aggregation Dictionary-Constrained Decoding Frequency-Based Reranking

slide-66
SLIDE 66

66

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained Constrained Cotterell et al. 2017

slide-67
SLIDE 67

Cotterell et al. 2017 Constrained

67

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained

slide-68
SLIDE 68

68

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained Constrained Cotterell et al. 2017

slide-69
SLIDE 69

69

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained Constrained Cotterell et al. 2017

Significant improvement when combining Dist and Seq

slide-70
SLIDE 70

70

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained Constrained Cotterell et al. 2017

Frequency statistics are a valuable signal

slide-71
SLIDE 71

71

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained Constrained Cotterell et al. 2017

Combined model still outperforms separate models

slide-72
SLIDE 72

72

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained Constrained Cotterell et al. 2017

slide-73
SLIDE 73

Cotterell et al. 2017

73

Results

40 45 50 55 60 65 70 75 80 85

Dist Seq Aggr Seq+Freq Aggr+Freq

T

  • ken Accuracy

Unconstrained Constrained

22% and 37% relative error reductions over Seq

slide-74
SLIDE 74

74

Results by Transformation

10 20 30 40 50 60 70 80 90 100

Nominal Result Agent Adverb

T

  • ken Accuracy

Baseline Aggr Aggr+Freq+Dict Cotterell et al. 2017

slide-75
SLIDE 75

Baseline Aggr Aggr+Freq+Dict 75

Results by Transformation

10 20 30 40 50 60 70 80 90 100

Nominal Result Agent Adverb

T

  • ken Accuracy

Cotterell et al. 2017

slide-76
SLIDE 76

Baseline Aggr Aggr+Freq+Dict 76

Results by Transformation

10 20 30 40 50 60 70 80 90 100

Nominal Result Agent Adverb

T

  • ken Accuracy

Cotterell et al. 2017

slide-77
SLIDE 77

Baseline Aggr 77

Results by Transformation

10 20 30 40 50 60 70 80 90 100

Nominal Result Agent Adverb

T

  • ken Accuracy

Aggr+Freq+Dict Cotterell et al. 2017

slide-78
SLIDE 78

78

Results by Transformation

10 20 30 40 50 60 70 80 90 100

Nominal Result Agent Adverb

T

  • ken Accuracy

Baseline Aggr Aggr+Freq+Dict Cotterell et al. 2017

slide-79
SLIDE 79

79

Results by Transformation

10 20 30 40 50 60 70 80 90 100

Nominal Result Agent Adverb

T

  • ken Accuracy

Baseline Aggr Aggr+Freq+Dict Cotterell et al. 2017

slide-80
SLIDE 80

80

Results by Transformation

10 20 30 40 50 60 70 80 90 100

Nominal Result Agent Adverb

T

  • ken Accuracy

Baseline Aggr Aggr+Freq+Dict Cotterell et al. 2017

equivalent {equivalence, equivalency}

Nominal

slide-81
SLIDE 81

81

What does each model do well?

slide-82
SLIDE 82

82

What does each model do well?

Orthographic does best Distributional does best

slide-83
SLIDE 83

83

What does each model do well?

Orthographic does best Distributional does best

slide-84
SLIDE 84

84

What does each model do well?

Orthographic does best Distributional does best

slide-85
SLIDE 85

85

What does each model do well?

*sponsorment sponsorship

Orthographic does best Distributional does best

slide-86
SLIDE 86

86

What does each model do well?

Orthographic does best Distributional does best

slide-87
SLIDE 87

87

What does each model do well?

Orthographic does best Distributional does best wise wisely

Adverb

quick quickly

Adverb

slide-88
SLIDE 88

88

What does each model do well?

Orthographic does best Distributional does best wise wisely

Adverb

quick quickly

Adverb Result

renew renewal invest investment

Result

inspire inspiration

Result

slide-89
SLIDE 89

89

Conclusion

  • Aggregation model for English derivational

morphology

  • Dictionary-constrained decoding
  • Frequency-based reranking
  • Distributional model per-transformation
  • Best open- and closed-vocabulary models

demonstrate 22% and 37% reduction in error

– New state-of-the-art results

slide-90
SLIDE 90

90

Code & Data

Code

https://github.com/danieldeutsch/derivational-morphology

Data

https://github.com/ryancotterell/derivational-paradigms Powered by

slide-91
SLIDE 91

91

References

  • Cotterell et al. 2017, Paradigm completion for

derivational morphology. In EMNLP

slide-92
SLIDE 92

Thank you!