An Unsupervised Method for Uncovering Morphological Chains Karthik - - PowerPoint PPT Presentation

an unsupervised method for uncovering morphological chains
SMART_READER_LITE
LIVE PREVIEW

An Unsupervised Method for Uncovering Morphological Chains Karthik - - PowerPoint PPT Presentation

An Unsupervised Method for Uncovering Morphological Chains Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology 1 Morphological Chains 2 Morphological Chains Chains to model the formation of words.


slide-1
SLIDE 1

An Unsupervised Method for Uncovering Morphological Chains

Karthik Narasimhan Regina Barzilay Tommi Jaakkola CSAIL, Massachusetts Institute of Technology

1

slide-2
SLIDE 2

Morphological Chains

2

slide-3
SLIDE 3

Morphological Chains

Chains to model the formation of words.

2

slide-4
SLIDE 4

Morphological Chains

Chains to model the formation of words.

paint → painting → paintings

2

slide-5
SLIDE 5

Morphological Chains

Chains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios

slide-6
SLIDE 6

Morphological Chains

Chains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios Segmentation

slide-7
SLIDE 7

Morphological Chains

Chains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios Segmentation Paradigms

slide-8
SLIDE 8

Morphological Chains

Chains to model the formation of words.

paint → painting → paintings

2

Richer representation than traditional scenarios Segmentation Paradigms

slide-9
SLIDE 9

Our Approach

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

slide-10
SLIDE 10

Our Approach

  • Orthographic features


Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

slide-11
SLIDE 11

Our Approach

  • Orthographic features


Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013

  • Semantic features


Schone and Jurafsky, 2000; Baroni et al., 2002

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

slide-12
SLIDE 12

Our Approach

  • Orthographic features


Morfessor (Goldwater and Johnson, 2004; Creutz and Lagus, 2007), Poon et al., 2009, Dreyer and Eisner, 2009, Sirts and Goldwater, 2013

  • Semantic features


Schone and Jurafsky, 2000; Baroni et al., 2002

  • Handle transformations. (plan → planning)

3

paint → painting

Core Idea: Unsupervised discriminative model over pairs of words in the chain.

slide-13
SLIDE 13

Textual Cues

4

slide-14
SLIDE 14

Textual Cues

Orthographic

4

slide-15
SLIDE 15

Textual Cues

Orthographic Patterns in the characters forming words.

4

slide-16
SLIDE 16

Textual Cues

Orthographic Patterns in the characters forming words. paint paints painted pain pains pained

4

slide-17
SLIDE 17

Textual Cues

Orthographic Patterns in the characters forming words. paint paints painted pain pains pained pain paint ran rant

4

slide-18
SLIDE 18

Textual Cues

Orthographic Patterns in the characters forming words. paint paints painted pain pains pained pain paint ran rant Semantic Meaning embedded as vectors.

4

slide-19
SLIDE 19

Textual Cues

Orthographic Patterns in the characters forming words. paint paints painted pain pains pained pain paint ran rant

A B cos(A,B) paint paints 0.68 paint painted 0.60 pain pains 0.60 pain paint 0.11 ran rant 0.09

Semantic Meaning embedded as vectors.

4

slide-20
SLIDE 20

Textual Cues

Orthographic Patterns in the characters forming words. paint paints painted pain pains pained pain paint ran rant

A B cos(A,B) paint paints 0.68 paint painted 0.60 pain pains 0.60 pain paint 0.11 ran rant 0.09

Semantic Meaning embedded as vectors.

4

slide-21
SLIDE 21

Task Setup

5

Training Unannotated word list with frequencies Word Vector Learning Large text corpus Wikipedia a ability able about 395134 17793 56802 524355

slide-22
SLIDE 22

6

slide-23
SLIDE 23

6

nation → national → international → internationally nation → national → nationally → internationally

Multiple chains possible for a word.

slide-24
SLIDE 24

6

nation → national → international → internationally nation → national → nationally → internationally

Multiple chains possible for a word.

nation → national → international → internationally nation → national → nationalize

Different chains can share word pairs.

slide-25
SLIDE 25

Independence Assumption

7

slide-26
SLIDE 26

Independence Assumption

7

Treat word-parent pairs separately

slide-27
SLIDE 27

Independence Assumption

7

national Word (w) Treat word-parent pairs separately

slide-28
SLIDE 28

Independence Assumption

7

national Word (w) nation Parent (p) Suffix Type (t) Treat word-parent pairs separately

slide-29
SLIDE 29

Independence Assumption

7

national Word (w) nation Parent (p) Suffix Type (t) Treat word-parent pairs separately

slide-30
SLIDE 30

Independence Assumption

7

national Word (w) Candidate (z) nation Parent (p) Suffix Type (t) Treat word-parent pairs separately

slide-31
SLIDE 31

8

national Word (w) nation Parent (p) Suffix Type (t) Candidate (z)

slide-32
SLIDE 32

P(w, z) ∝ eθ·φ(w,z)

8

national Word (w) nation Parent (p) Suffix Type (t) Candidate (z)

slide-33
SLIDE 33

P(w, z) ∝ eθ·φ(w,z)

8

national Word (w) nation Parent (p) Suffix Type (t) Candidate (z)

Types - Prefix, Suffix, Transformations, Stop.

slide-34
SLIDE 34
  • Templates for handling changes in stem during

addition of affixes.

  • Repetition template: PQ → PQQR (for each Q in

alphabet). Ex.

  • Feature template for each transformation.

Transformations

plan → planning P Q R

9

slide-35
SLIDE 35

Transformation types

10

slide-36
SLIDE 36

3 different transformations:

Transformation types

10

slide-37
SLIDE 37

3 different transformations:

  • Repetition (plan → planning)

Transformation types

10

slide-38
SLIDE 38

3 different transformations:

  • Repetition (plan → planning)
  • Deletion (decide→deciding)

Transformation types

10

slide-39
SLIDE 39

3 different transformations:

  • Repetition (plan → planning)
  • Deletion (decide→deciding)
  • Modification (carry → carried)

Transformation types

10

slide-40
SLIDE 40

3 different transformations:

  • Repetition (plan → planning)
  • Deletion (decide→deciding)
  • Modification (carry → carried)

Trade-off between types of transformation and computational tractability.

Transformation types

10

slide-41
SLIDE 41

3 different transformations:

  • Repetition (plan → planning)
  • Deletion (decide→deciding)
  • Modification (carry → carried)

Trade-off between types of transformation and computational tractability.

  • These three do well for a range of languages and are

computationally tractable: max O(|∑|2) for alphabet ∑

Transformation types

10

slide-42
SLIDE 42

Features φ(w,z)

11

slide-43
SLIDE 43

Features φ(w,z)

11

Orthographic

slide-44
SLIDE 44

Features φ(w,z)

11

Orthographic

  • Affixes: Indicator feature

for top affixes

slide-45
SLIDE 45

Features φ(w,z)

11

Orthographic

  • Affixes: Indicator feature

for top affixes

  • Affix Correlation: pairs of

affixes sharing set of stems (inter-, re-), (under-, over-)

slide-46
SLIDE 46

Features φ(w,z)

11

Orthographic

  • Affixes: Indicator feature

for top affixes

  • Affix Correlation: pairs of

affixes sharing set of stems (inter-, re-), (under-, over-)

  • Word freq. of parent
slide-47
SLIDE 47

Features φ(w,z)

11

Orthographic

  • Affixes: Indicator feature

for top affixes

  • Affix Correlation: pairs of

affixes sharing set of stems (inter-, re-), (under-, over-)

  • Word freq. of parent
  • Transformation types with

character bigrams

slide-48
SLIDE 48

Features φ(w,z)

11

Orthographic

  • Affixes: Indicator feature

for top affixes

  • Affix Correlation: pairs of

affixes sharing set of stems (inter-, re-), (under-, over-)

  • Word freq. of parent
  • Transformation types with

character bigrams

Semantic

  • Cosine similarity

between word vectors

  • f word and parent
slide-49
SLIDE 49

Features φ(w,z)

11

Orthographic

  • Affixes: Indicator feature

for top affixes

  • Affix Correlation: pairs of

affixes sharing set of stems (inter-, re-), (under-, over-)

  • Word freq. of parent
  • Transformation types with

character bigrams

Semantic

  • Cosine similarity

between word vectors

  • f word and parent

Cosine similarity with player

slide-50
SLIDE 50

Learning

12

slide-51
SLIDE 51

Learning

  • Objective:

12

slide-52
SLIDE 52

Learning

  • Objective:

12

Y

w

P(w) = Y

w

X

z

P(w, z) = Y

w

X

z

eθ·φ(w,z) P

w0∈Σ⇤,z0 eθ·φ(w0,z0)

slide-53
SLIDE 53

Learning

  • Optimize likelihood using convex optimization: LBFGS-B

(with regularization)

  • Objective:

12

Y

w

P(w) = Y

w

X

z

P(w, z) = Y

w

X

z

eθ·φ(w,z) P

w0∈Σ⇤,z0 eθ·φ(w0,z0)

slide-54
SLIDE 54

Learning

  • Optimize likelihood using convex optimization: LBFGS-B

(with regularization)

  • Not tractable - requires summing over all possible strings

in alphabet to calculate normalization constant, Z.

  • Objective:

12

Y

w

P(w) = Y

w

X

z

P(w, z) = Y

w

X

z

eθ·φ(w,z) P

w0∈Σ⇤,z0 eθ·φ(w0,z0)

slide-55
SLIDE 55

Learning

  • Optimize likelihood using convex optimization: LBFGS-B

(with regularization)

  • Not tractable - requires summing over all possible strings

in alphabet to calculate normalization constant, Z.

  • Objective:

12

Y

w

P(w) = Y

w

X

z

P(w, z) = Y

w

X

z

eθ·φ(w,z) P

w0∈Σ⇤,z0 eθ·φ(w0,z0)

slide-56
SLIDE 56

Contrastive Estimation

13

slide-57
SLIDE 57

Contrastive Estimation

  • Instead, we use Contrastive Estimation (Smith and

Eisner, 2005):

13

slide-58
SLIDE 58

Contrastive Estimation

  • Instead, we use Contrastive Estimation (Smith and

Eisner, 2005):

  • Neighborhood of invalid words for each word to

take probability mass from.

13

slide-59
SLIDE 59

Contrastive Estimation

  • Instead, we use Contrastive Estimation (Smith and

Eisner, 2005):

  • Neighborhood of invalid words for each word to

take probability mass from.

  • Transpose pair of adjacent chars from first and

last k chars in word. (Ex. painting → paintnig)

13

slide-60
SLIDE 60

Contrastive Estimation

  • Instead, we use Contrastive Estimation (Smith and

Eisner, 2005):

  • Neighborhood of invalid words for each word to

take probability mass from.

  • Transpose pair of adjacent chars from first and

last k chars in word. (Ex. painting → paintnig) P(w, z) =

eθ·φ(w,z) P

w02N(w),z0 eθ·φ(w0,z0) 13

slide-61
SLIDE 61

Prediction

  • Predict chain in recursive fashion (argmax parent

candidate each time) till stop. paintings painting paint STOP

14

slide-62
SLIDE 62

Segmentation Experiments

  • Three languages - English, Arabic, Turkish
  • Evaluation: Morphological segmentation - Precision,

Recall, F1 over individual segmentation points

  • Baselines: Morfessor-Baseline, Morfessor CatMAP,

AGMorph (Sirts and Goldwater, 2013) and Lee et al. (2011)

15

slide-63
SLIDE 63

F1 scores on MorphoChallenge

22.5 45 67.5 90 English Turkish Arabic

Morfessor-B Morfessor-C AGMorph Lee (M2) MorphoChain

16

slide-64
SLIDE 64

Effect of data size

17

slide-65
SLIDE 65

Affix Analysis

18

slide-66
SLIDE 66

Error analysis

  • Errors (on a random subset of 50 words per language):

Language Over-segment Under-segment English 10% 86% Turkish 12% 78% Arabic 60% 40%

  • Most errors (58%) in Turkish due to parent words

not present or having low count.

  • Root template morphology of Arabic causes 14%
  • f errors.

19

slide-67
SLIDE 67

Sample segmentations

20

slide-68
SLIDE 68

Morphology in Keyword Spotting

slide-69
SLIDE 69

Morphology in Keyword Spotting

  • Keyword Spotting is the task of identifying keywords in speech

utterances

  • Major issue: Out of vocabulary words
slide-70
SLIDE 70

Morphology in Keyword Spotting

ATWV

0.05 0.1 0.15 0.2 KWS-Test Supervised Unsupervised (Morfessor)

KWS Results on OOV keywords in Turkish LLP (Narasimhan et al., 2014)

  • Keyword Spotting is the task of identifying keywords in speech

utterances

  • Major issue: Out of vocabulary words
slide-71
SLIDE 71

Morphology in Keyword Spotting

ATWV

0.05 0.1 0.15 0.2 KWS-Test Supervised Unsupervised (Morfessor)

KWS Results on OOV keywords in Turkish LLP (Narasimhan et al., 2014)

  • Adding morphemes helps KWS.
  • Better morphology can lead to better KWS (supervised vs.

unsupervised)

  • Need for better unsupervised segmentation.
  • Keyword Spotting is the task of identifying keywords in speech

utterances

  • Major issue: Out of vocabulary words
slide-72
SLIDE 72
  • MorphoChain outperforms state-of-the-art unsupervised

morphological system on KWS

ATWV 20.2 20.35 20.5 20.65 20.8 w/o web data

Morfessor MorphoChain

Morfessor vs MorphoChain for KWS

ATWV 23 23.75 24.5 25.25 26 w/ web data

ATWV scores on Bengali VLLP

*in collaboration with Damianos Karakos and Rich Schwartz at BBN

slide-73
SLIDE 73

Conclusions

  • A new method for unsupervised morphological

analysis incorporating both orthographic and semantic features.

  • Equals or outperforms state-of-the-art systems on

morphological segmentation.

  • Works well on downstream tasks.

Code: http://people.csail.mit.edu/karthikn/morphochain/

23

slide-74
SLIDE 74

24