Induction of Multilingual Morphology with only Minimal Supervision - - PowerPoint PPT Presentation

induction of multilingual morphology with only minimal
SMART_READER_LITE
LIVE PREVIEW

Induction of Multilingual Morphology with only Minimal Supervision - - PowerPoint PPT Presentation

Introduction Task Definition Contextual Similarity Model Combination Induction of Multilingual Morphology with only Minimal Supervision Richard Wicentowski Computer Science Department Swarthmore College November 15, 2006 Introduction Task


slide-1
SLIDE 1

Introduction Task Definition Contextual Similarity Model Combination

Induction of Multilingual Morphology with only Minimal Supervision

Richard Wicentowski

Computer Science Department Swarthmore College

November 15, 2006

slide-2
SLIDE 2

Introduction Task Definition Contextual Similarity Model Combination

Outline

1

Introduction

2

Task Definition

3

Contextual Similarity

4

Model Combination

slide-3
SLIDE 3

Introduction Task Definition Contextual Similarity Model Combination

Outline

1

Introduction

2

Task Definition

3

Contextual Similarity

4

Model Combination

slide-4
SLIDE 4

Introduction Task Definition Contextual Similarity Model Combination

Motivation: Machine Translation

Saint-Exupéry, Le Petit Prince, 1943

Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

slide-5
SLIDE 5

Introduction Task Definition Contextual Similarity Model Combination

slide-6
SLIDE 6

Introduction Task Definition Contextual Similarity Model Combination

Motivation: Machine Translation

Saint-Exupéry, Le Petit Prince, 1943

Of course, known as the fox. You are not yet for me that a little boy very similar to a hundred and thousand small boys. And I do not need you. And you do not need me either. I am for you

  • nly one fox similar to a hundred and thousand foxes. But, if

you tame me, we will need one the other. You will be for me single in the world. I will be for you single in the world... I start to include/understand, known as the small prince. It there be a flower... I believe that it me have tame...

slide-7
SLIDE 7

Introduction Task Definition Contextual Similarity Model Combination

slide-8
SLIDE 8

Introduction Task Definition Contextual Similarity Model Combination

Native Language Speakers (millions) Mandarin Chinese 867 Hindi 400 Spanish 390 English 310 Standard Arabic 206 Indonesian 222 Bengali 194 Portuguese 177 Russian 145 Japanese 121 Persian 101 Punjabi 104 Javanese 76 German 75 Vietnamese 70 Telugu 70 Native Language Speakers (millions) Marathi 68 Tamil 68 Korean 67 French 64 Urdu 61 Italian 61 Turkish 60 Yoruba 47 Gujarati 46 Polish 46 Ukranian 39 Malayalam 36 Kannada 35 Oriya 32 Burmese 32 Thai 31

slide-9
SLIDE 9

Introduction Task Definition Contextual Similarity Model Combination

Resources Needed for Machine Translation

What resources are needed to translate from Hindi to Bengali? Hindi / Bengali dictionary Word translation in context (Lexical choice) Morphological analyzers and generators Syntactic parsers / knowledge of grammar And, if we wanted to do this translation from speech rather than written text, we’d also need speech recognizers...

slide-10
SLIDE 10

Introduction Task Definition Contextual Similarity Model Combination

Resources Needed for Machine Translation

What resources are needed to translate from Hindi to Bengali? Hindi / Bengali dictionary Word translation in context (Lexical choice) Morphological analyzers and generators Syntactic parsers / knowledge of grammar And, if we wanted to do this translation from speech rather than written text, we’d also need speech recognizers...

slide-11
SLIDE 11

Introduction Task Definition Contextual Similarity Model Combination

Morphology and Lexical Choice in Machine Translation

Saint-Exupéry, Le Petit Prince, 1943

Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

slide-12
SLIDE 12

Introduction Task Definition Contextual Similarity Model Combination

Dictionary coverage vs. Inflectional Degree

Turkish Italian Portuguese Average number of inflections per root Dictionary coverage by type French Swedish English Spanish

1 35% 40% 45% 25% 20% 15% 10% 100 10 30%

slide-13
SLIDE 13

Introduction Task Definition Contextual Similarity Model Combination

Morphology and Lexical Choice in Machine Translation

Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

Morphological Analysis

crussiez croyez crût croyant crois croître croquer croiser crotter ... critiquer croasser ... croire

slide-14
SLIDE 14

Introduction Task Definition Contextual Similarity Model Combination

Morphology and Lexical Choice in Machine Translation

Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

Lexical Choice

crussiez croyez crût crois croyant croître croquer croiser crotter believe cross suppose consider conceive ... grow criticize ... ... critiquer croasser ... croire

Morphological Analysis

slide-15
SLIDE 15

Introduction Task Definition Contextual Similarity Model Combination

Morphology and Lexical Choice in Machine Translation

Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...

Morphological Generation

crussiez croyez crût crois croyant believed believes believe believing croître croquer croiser crotter believe cross suppose consider conceive ... grow criticize ... ... critiquer croasser ... croire

Morphological Analysis Lexical Choice

slide-16
SLIDE 16

Introduction Task Definition Contextual Similarity Model Combination

Outline

1

Introduction

2

Task Definition

3

Contextual Similarity

4

Model Combination

slide-17
SLIDE 17

Introduction Task Definition Contextual Similarity Model Combination

Task definition

Morphological Analysis Input inflection Output root, optional part of speech Morphological Generation Input root, part of speech Output inflection

slide-18
SLIDE 18

Introduction Task Definition Contextual Similarity Model Combination

Task definition

Morphological Analysis Input inflection crois Output root, optional part of speech croire, 2S Imperative croire, 1S Present croire, 2S Present Morphological Generation Input root, part of speech croire, Present Participle Output inflection croyant

slide-19
SLIDE 19

Introduction Task Definition Contextual Similarity Model Combination

Task definition

Morphological Analysis Input inflection burned Output root, optional part of speech burn, Past Indicative burn, Past Participle Morphological Generation Input root, part of speech burn, Past Indicative Output inflection burnt burned

slide-20
SLIDE 20

Introduction Task Definition Contextual Similarity Model Combination

Inflectional morphological phenomena

prefixation: geuza → mligeuza (Swahili) affixation suffixation: adhair → adhairim (Irish) circumfixation: mischen → gemischt (German) infixation: palit → pumalit (Tagalog) point-of- placer → plaça (French) affixation elision: close → closing (English) stem gemination: stir → stirred (English) changes voicing: zwerft → zwerven (Dutch) vowel abartmak → abartmasanız (Turkish) harmony addetmek → addetmeseniz (Turkish) internal afbryde → afbrød (Danish) vowel shift skrike → skreik (Norwegian)

slide-21
SLIDE 21

Introduction Task Definition Contextual Similarity Model Combination

Inflectional morphological phenomena

reduplication: gupit → gugupit (Tagalog) agglutination: gupit → igugupit agglutination agglutination: gupit → ipagugupit agglutination: gupit → ipinagugupit and agglutination: ev → evde (Turkish) agglutination: evde → evdeki agglutination: evdeki → evdekiler reduplication reduplication: rumah → rumahrumah (Malay) reduplication: ibu → ibuibu root and ktb → kateb (Arabic) pattern ktb → kattab highly fi → erai (Romanian) irregular j¯ an¯ a → gay¯ a (Hindi) forms eiga → áttum (Icelandic)

slide-22
SLIDE 22

Introduction Task Definition Contextual Similarity Model Combination

Task definition

In order to perform morphological analysis, we must design an algorithm which can predict the root forms of inflections. There are three ways to approach the task using a machine-learning framework:

1

Supervised Learning: The algorithm is provided with training data, e.g. crois → croire.

2

Minimally Supervised Learning: The algorithm is provided some explicit information, but not in the form of training pairs, e.g. “This language is suffixal”, or “-ing is a productive suffix in this language”.

3

Unsupervised Learning: The algorithm is not provided with any explicit information; rather, information must be extracted from other sources, e.g. a large text corpus.

slide-23
SLIDE 23

Introduction Task Definition Contextual Similarity Model Combination

Supervised Machine Learning Algorithms

A class of algorithms designed to form generalizations from “training data” in order to make predictions about previously unseen data. For example, given this training data... inflected verb citation form jumping jump singing sing burning burn ... ... ...we want to predict the citation form of an inflected verb: inflected verb citation form fishing ? carting ? soaring ?

slide-24
SLIDE 24

Introduction Task Definition Contextual Similarity Model Combination

Unsupervised Machine Learning Algorithms

A class of algorithms designed to make predictions about new data without explicit training data. For example, given a large text corpus...

... John wanted to go to the park so he could jump and sing. His mom told him “Don’t forget to put on sun tan lotion to keep from burning.” So John put on sun tan lotion and went off to the park. While jumping and singing in the park, the sun was very strong, but John didn’t burn because he’d remembered to put on sun tan lotion. ...

...we want to predict the citation form of an inflected verb: inflected verb citation form

(inflection) (root)

fishing ? carting ? soaring ?

slide-25
SLIDE 25

Introduction Task Definition Contextual Similarity Model Combination

Unsupervised Machine Learning Algorithms

A class of algorithms designed to make predictions about new data without explicit training data. For example, given a large text corpus...

... John wanted to go to the park so he could jump and sing. His mom told him “Don’t forget to put on sun tan lotion to keep from burning.” So John put on sun tan lotion and went off to the park. While jumping and singing in the park, the sun was very strong, but John didn’t burn because he’d remembered to put on sun tan lotion. ...

...we want to predict the citation form of an inflected verb: inflected verb citation form

(inflection) (root)

fishing ? carting ? soaring ?

slide-26
SLIDE 26

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Final Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments

slide-27
SLIDE 27

Introduction Task Definition Contextual Similarity Model Combination

(Representative) Previous work

Hand-built

Koskenniemi (1983): Finite-state transducers

Supervised Approaches

Rumelhart and McClelland (1986): Neural Networks Mooney and Califf (1995): Inductive Logic Programming

Unsupervised Approaches

Kazakov (1997): Bootstrapping IDL Brent (1993, 1999), Creutz (2005): Segmentation Goldsmith (2001), Snover et al (2002): Suffix learning

1

All of these approaches look for string changes, e.g. +ing

2

...have limited support of stem changes, e.g. sleep → slept

3

...and have very limited capabilities to handle highly irregular forms, e.g. be → is, was, were

slide-28
SLIDE 28

Introduction Task Definition Contextual Similarity Model Combination

Prior approaches to computational morphology

“...[O]ne might pose the question, does the young language learner – who has access not only to the spoken language, but perhaps also to the rudiments of syntax and to the intended meaning of the words and sentences – does the young learner have access to additional information that simplifies the task of morpheme identification? ...I think that such a belief is very likely mistaken. Knowledge of semantics and even grammar is unlikely to make the problem of morphology discovery significantly easier.”

  • J. Goldsmith, “Unsupervised Learning of the Morphology of a Natural

Language”, Computational Linguistics (27:2), 2001.

slide-29
SLIDE 29

Introduction Task Definition Contextual Similarity Model Combination

Prior approaches to computational morphology

“...[O]ne might pose the question, does the young language learner – who has access not only to the spoken language, but perhaps also to the rudiments of syntax and to the intended meaning of the words and sentences – does the young learner have access to additional information that simplifies the task of morpheme identification? ...I think that such a belief is very likely mistaken. Knowledge of semantics and even grammar is unlikely to make the problem of morphology discovery significantly easier.”

  • J. Goldsmith, “Unsupervised Learning of the Morphology of a Natural

Language”, Computational Linguistics (27:2), 2001. John Goldsmith’72 (Philosophy, Math, Economics)

slide-30
SLIDE 30

Introduction Task Definition Contextual Similarity Model Combination

Outline

1

Introduction

2

Task Definition

3

Contextual Similarity

4

Model Combination

slide-31
SLIDE 31

Introduction Task Definition Contextual Similarity Model Combination

Semantic Similarity

Motivation Most inflectional variants are semantically similar to their citation form. Many semantically related words are not morphologically related: e.g. drink, sip, guzzle, quaff, etc Many orthographically similar words are not semantically related: e.g. impact, impart, import, impose, improve, etc Though it does happen: e.g. flare, flash, flame

slide-32
SLIDE 32

Introduction Task Definition Contextual Similarity Model Combination

Use (a very crude approximation of) semantics as part of an unsupervised solution to morphological analysis. We will judge a word by the company it keeps: The semantics of a word can be determined by the words with which it co-occurs. Measure the (cosine of the) angle between context vectors: Smaller angles indicate more semantic similarity than larger angles hands head faith violently himself kill away shook 128 103 21 17

  • 2

shake 151 98 8 12

  • shoot
  • 3

56 8

  • shoo
  • 6
slide-33
SLIDE 33

Introduction Task Definition Contextual Similarity Model Combination 1 0.5 90 75 60 45 30 15 cos(θ) θ

hands head faith violently himself kill away shook 128 103 21 17

  • 2

shake 151 98 8 12

  • shoot
  • 3

56 8

  • shoo
  • 6
slide-34
SLIDE 34

Introduction Task Definition Contextual Similarity Model Combination

Contextual Similarity

Visualizing this similarity (repeated-bisection clustering)

slide-35
SLIDE 35

Introduction Task Definition Contextual Similarity Model Combination

Performance

We do not need (or expect) an inflection to be most contextually similar to its root. What do we expect out of the contextual clustering?

Positive evidence for correct pairings Negative evidence for incorrect pairings

Show how often the single most similar word to the inflection was the root (“Top 1”) and how often the root was

  • ne of the top ten most similar words (“Top 10”)
slide-36
SLIDE 36

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Final Models Unsupervised Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments

slide-37
SLIDE 37

Introduction Task Definition Contextual Similarity Model Combination

Performance of Contextual Similarity

Corpus Size Language (in millions) Top 1 Top 10 Russian 34 17.93% 47.39% Estonian 43 18.87% 45.68% French 16.6 17.72% 37.41% Danish 14 13.66% 31.52% Portuguese 4.5 11.59% 30.34% English 118 11.68% 26.87% Icelandic 25 9.58% 26.13% Spanish 58 9.69% 24.01% Basque 0.7 8.08% 23.12%

slide-38
SLIDE 38

Introduction Task Definition Contextual Similarity Model Combination

Performance of Contextual Similarity

Corpus Size Language (in millions) Top 1 Top 10 Polish 23 5.99% 21.99% Romanian 0.1 6.56% 19.62% Dutch 1.3 4.95% 18.77% German 19 5.09% 14.62% Finnish 0.6 3.38% 13.53% Czech 1.3 3.47% 12.36% Swedish 1.0 3.32% 12.34% Italian 46 3.52% 12.33% Swahili 0.5 2.20% 10.06%

slide-39
SLIDE 39

Introduction Task Definition Contextual Similarity Model Combination

Context model sensitivity

The performance of the context model is sensitive to a number

  • f parameters, including:

The size of the window, e.g. should it include 5 words, 10 words, 20 words? The position of the window, e.g. should the window be centered on the word, or should it be off-center to the right

  • r left?

The words to include/exclude in the window, e.g. should we exclude punctuation? The choice of our corpus Of importance here is not that the model is sensitive to these factors... What is important is: Can we learn these parameters automatically?

slide-40
SLIDE 40

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Final Models Unsupervised Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments

slide-41
SLIDE 41

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments Final Noisy

slide-42
SLIDE 42

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Parameters Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Model Combination Parameters Model Initial Alignments Final Re−estimate Model

slide-43
SLIDE 43

Introduction Task Definition Contextual Similarity Model Combination

Left Center Right Language 6x0 3x3 0x6 S-V-O Spanish 6.37% 20.31% 29.38% Portuguese 12.05% 26.58% 32.87% French 9.08% 38.40% 45.60% Italian 3.69% 9.99% 14.98% Romanian 10.42% 18.71% 20.86% English 13.25% 21.98% 25.67% Danish 7.21% 24.61% 34.59% Swedish 2.09% 10.36% 18.69% Icelandic 10.93% 23.43% 29.98% 1x5 Estonian 31.87% 42.20% 32.21% 44.60% Finnish 5.40% 12.09% 12.15% Tagalog 10.10% 15.08% 17.08% Swahili 8.63% 8.68% 11.02%

slide-44
SLIDE 44

Introduction Task Definition Contextual Similarity Model Combination

Left Center Right Language 6x0 3x3 0x6 Free / S-V-O Czech 3.30% 11.05% 11.02% Polish 8.16% 18.91% 20.87% Russian 19.91% 40.91% 47.35% Verb Second (V2) German 9.97% 14.78% 9.96% Dutch 11.78% 16.32% 15.50% S-O-V Turkish 52.66% 44.40% 25.28% Basque 25.88% 19.75% 6.44%

slide-45
SLIDE 45

Introduction Task Definition Contextual Similarity Model Combination

Outline

1

Introduction

2

Task Definition

3

Contextual Similarity

4

Model Combination

slide-46
SLIDE 46

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Final Models Unsupervised Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments

slide-47
SLIDE 47

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments Final Noisy

slide-48
SLIDE 48

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Parameters Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Model Combination Parameters Model Initial Alignments Final Re−estimate Model

slide-49
SLIDE 49

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments Final Noisy

slide-50
SLIDE 50

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Model Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Parameters Model Initial Alignments Final Combination

slide-51
SLIDE 51

Introduction Task Definition Contextual Similarity Model Combination

Iterative Retraining

Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Final

slide-52
SLIDE 52

Introduction Task Definition Contextual Similarity Model Combination

Choosing a final analysis

Iteration Language 1 2 3 4 5 English 88.5% 94.9% 95.8% 93.1% 97.5% 99.1% Portuguese 96.0% 96.7% 97.0% 97.5% 97.6% 98.2% German 93.1% 93.0% 94.4% 92.3% 93.3% 94.8% Basque 86.3% 89.9% 90.8% 90.5% 93.8% 95.1% Russian 79.7% 80.8% 81.6% 76.8% 78.9% 84.4% Estonian 81.1% 83.8% 84.9% 85.9% 86.4% 88.2% Turkish 85.7% 97.2% 97.7% 98.7% 99.1% 99.2% Columns 0-4 show the precision of the supervised model, trained using the iteratively refined unsupervised models Column 5 shows the precision of the final model, a combination of the unsupervised and supervised models

slide-53
SLIDE 53

Introduction Task Definition Contextual Similarity Model Combination

How did we do?

It looks like we did pretty well, but without digging through the data to see the mistakes we’re making, it’s hard to know But we have one point of reference that might serve to show how well we are doing...

slide-54
SLIDE 54

Introduction Task Definition Contextual Similarity Model Combination

How did we do?

In English and Portuguese, we do better using unsupervised methods than using the fully supervised method! In Basque and Turkish, we were pretty close, too. Final Fully Language Iteration Supervised English 99.1% 99.1% Portuguese 98.2% 97.9% German 94.8% 97.9% Basque 95.1% 96.0% Russian 84.4% 90.8% Estonian 88.2% 96.8% Turkish 99.2% 99.5%

slide-55
SLIDE 55

Introduction Task Definition Contextual Similarity Model Combination

How did we do?

Notice that even the supervised learner gets a boost from the backoff to the unsupervised methods Final Fully Supervised Language Iteration Supervised with Backoff English 99.1% 99.1% 99.5% Portuguese 98.2% 97.9% 98.9% German 94.8% 97.9% 98.3% Basque 95.1% 96.0% 97.4% Russian 84.4% 90.8% 93.3% Estonian 88.2% 96.8% 98.3% Turkish 99.2% 99.5% 99.8%

slide-56
SLIDE 56

Introduction Task Definition Contextual Similarity Model Combination

How did we do?

Accuracy after final iteration on different types of inflections (where classification labels were available) Semi- Obsolete/ Language All Regular Regular Irregular Other English 99.05% 99.41% 99.50% 54.84% 100.0% Portuguese 98.20% 98.31%

  • 83.33%

20.00% German 94.76% 97.83% 96.31% 75.60% 96.81% Basque 95.14%

  • Russian

84.42%

  • Estonian

88.18%

  • Turkish

99.19% 99.96% 97.33% 19.35%

slide-57
SLIDE 57

Introduction Task Definition Contextual Similarity Model Combination

Recap

Motivated the need for morphology using machine translation Highlighted the lack of MT resources Described the task of morphological analysis Presented a way of approaching the problem which can be applied to any language with large (electronically available) text sources Showed the success of the approach

slide-58
SLIDE 58

Introduction Task Definition Contextual Similarity Model Combination

Future Work

Revise the supervised models (extend to new phenomona) Grammatical category (part of speech) acquisition for use in morphological generation Leverage the unsupervised models for lexical choice Release the source code (so others can improve on it)

slide-59
SLIDE 59

Introduction Task Definition Contextual Similarity Model Combination

Thanks

Swarthmore College for providing second semester support through the James A. Michener Faculty Fellowship Connie Li ’06 and Scott Blaha ’07 (unsupervised models) Adrian Packel ’04 (acquisition of Hebrew data via OCR) Phil Katz ’07 and Matthew Singleton ’07 (lexical choice)