Introduction Task Definition Contextual Similarity Model Combination
Induction of Multilingual Morphology with only Minimal Supervision - - PowerPoint PPT Presentation
Induction of Multilingual Morphology with only Minimal Supervision - - PowerPoint PPT Presentation
Introduction Task Definition Contextual Similarity Model Combination Induction of Multilingual Morphology with only Minimal Supervision Richard Wicentowski Computer Science Department Swarthmore College November 15, 2006 Introduction Task
Introduction Task Definition Contextual Similarity Model Combination
Outline
1
Introduction
2
Task Definition
3
Contextual Similarity
4
Model Combination
Introduction Task Definition Contextual Similarity Model Combination
Outline
1
Introduction
2
Task Definition
3
Contextual Similarity
4
Model Combination
Introduction Task Definition Contextual Similarity Model Combination
Motivation: Machine Translation
Saint-Exupéry, Le Petit Prince, 1943
Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...
Introduction Task Definition Contextual Similarity Model Combination
Introduction Task Definition Contextual Similarity Model Combination
Motivation: Machine Translation
Saint-Exupéry, Le Petit Prince, 1943
Of course, known as the fox. You are not yet for me that a little boy very similar to a hundred and thousand small boys. And I do not need you. And you do not need me either. I am for you
- nly one fox similar to a hundred and thousand foxes. But, if
you tame me, we will need one the other. You will be for me single in the world. I will be for you single in the world... I start to include/understand, known as the small prince. It there be a flower... I believe that it me have tame...
Introduction Task Definition Contextual Similarity Model Combination
Introduction Task Definition Contextual Similarity Model Combination
Native Language Speakers (millions) Mandarin Chinese 867 Hindi 400 Spanish 390 English 310 Standard Arabic 206 Indonesian 222 Bengali 194 Portuguese 177 Russian 145 Japanese 121 Persian 101 Punjabi 104 Javanese 76 German 75 Vietnamese 70 Telugu 70 Native Language Speakers (millions) Marathi 68 Tamil 68 Korean 67 French 64 Urdu 61 Italian 61 Turkish 60 Yoruba 47 Gujarati 46 Polish 46 Ukranian 39 Malayalam 36 Kannada 35 Oriya 32 Burmese 32 Thai 31
Introduction Task Definition Contextual Similarity Model Combination
Resources Needed for Machine Translation
What resources are needed to translate from Hindi to Bengali? Hindi / Bengali dictionary Word translation in context (Lexical choice) Morphological analyzers and generators Syntactic parsers / knowledge of grammar And, if we wanted to do this translation from speech rather than written text, we’d also need speech recognizers...
Introduction Task Definition Contextual Similarity Model Combination
Resources Needed for Machine Translation
What resources are needed to translate from Hindi to Bengali? Hindi / Bengali dictionary Word translation in context (Lexical choice) Morphological analyzers and generators Syntactic parsers / knowledge of grammar And, if we wanted to do this translation from speech rather than written text, we’d also need speech recognizers...
Introduction Task Definition Contextual Similarity Model Combination
Morphology and Lexical Choice in Machine Translation
Saint-Exupéry, Le Petit Prince, 1943
Bien sûr, dit le renard. Tu n’es pas encore pour moi qu’un petit garçon tout semblable à cent mille petits garçons. Et je n’ai pas besoin de toi. Et tu n’as pas besoin de moi non plus. Je ne suis pour toi qu’un renard semblable à cent mille renards. Mais, si tu m’apprivoises, nous aurons besoin l’un de l’autre. Tu seras pour moi unique au monde. Je serai pour toi unique au monde... Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...
Introduction Task Definition Contextual Similarity Model Combination
Dictionary coverage vs. Inflectional Degree
Turkish Italian Portuguese Average number of inflections per root Dictionary coverage by type French Swedish English Spanish
1 35% 40% 45% 25% 20% 15% 10% 100 10 30%
Introduction Task Definition Contextual Similarity Model Combination
Morphology and Lexical Choice in Machine Translation
Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...
Morphological Analysis
crussiez croyez crût croyant crois croître croquer croiser crotter ... critiquer croasser ... croire
Introduction Task Definition Contextual Similarity Model Combination
Morphology and Lexical Choice in Machine Translation
Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...
Lexical Choice
crussiez croyez crût crois croyant croître croquer croiser crotter believe cross suppose consider conceive ... grow criticize ... ... critiquer croasser ... croire
Morphological Analysis
Introduction Task Definition Contextual Similarity Model Combination
Morphology and Lexical Choice in Machine Translation
Je commence à comprendre, dit le petit prince. Il y a une fleur... je crois qu’elle m’a apprivoisé...
Morphological Generation
crussiez croyez crût crois croyant believed believes believe believing croître croquer croiser crotter believe cross suppose consider conceive ... grow criticize ... ... critiquer croasser ... croire
Morphological Analysis Lexical Choice
Introduction Task Definition Contextual Similarity Model Combination
Outline
1
Introduction
2
Task Definition
3
Contextual Similarity
4
Model Combination
Introduction Task Definition Contextual Similarity Model Combination
Task definition
Morphological Analysis Input inflection Output root, optional part of speech Morphological Generation Input root, part of speech Output inflection
Introduction Task Definition Contextual Similarity Model Combination
Task definition
Morphological Analysis Input inflection crois Output root, optional part of speech croire, 2S Imperative croire, 1S Present croire, 2S Present Morphological Generation Input root, part of speech croire, Present Participle Output inflection croyant
Introduction Task Definition Contextual Similarity Model Combination
Task definition
Morphological Analysis Input inflection burned Output root, optional part of speech burn, Past Indicative burn, Past Participle Morphological Generation Input root, part of speech burn, Past Indicative Output inflection burnt burned
Introduction Task Definition Contextual Similarity Model Combination
Inflectional morphological phenomena
prefixation: geuza → mligeuza (Swahili) affixation suffixation: adhair → adhairim (Irish) circumfixation: mischen → gemischt (German) infixation: palit → pumalit (Tagalog) point-of- placer → plaça (French) affixation elision: close → closing (English) stem gemination: stir → stirred (English) changes voicing: zwerft → zwerven (Dutch) vowel abartmak → abartmasanız (Turkish) harmony addetmek → addetmeseniz (Turkish) internal afbryde → afbrød (Danish) vowel shift skrike → skreik (Norwegian)
Introduction Task Definition Contextual Similarity Model Combination
Inflectional morphological phenomena
reduplication: gupit → gugupit (Tagalog) agglutination: gupit → igugupit agglutination agglutination: gupit → ipagugupit agglutination: gupit → ipinagugupit and agglutination: ev → evde (Turkish) agglutination: evde → evdeki agglutination: evdeki → evdekiler reduplication reduplication: rumah → rumahrumah (Malay) reduplication: ibu → ibuibu root and ktb → kateb (Arabic) pattern ktb → kattab highly fi → erai (Romanian) irregular j¯ an¯ a → gay¯ a (Hindi) forms eiga → áttum (Icelandic)
Introduction Task Definition Contextual Similarity Model Combination
Task definition
In order to perform morphological analysis, we must design an algorithm which can predict the root forms of inflections. There are three ways to approach the task using a machine-learning framework:
1
Supervised Learning: The algorithm is provided with training data, e.g. crois → croire.
2
Minimally Supervised Learning: The algorithm is provided some explicit information, but not in the form of training pairs, e.g. “This language is suffixal”, or “-ing is a productive suffix in this language”.
3
Unsupervised Learning: The algorithm is not provided with any explicit information; rather, information must be extracted from other sources, e.g. a large text corpus.
Introduction Task Definition Contextual Similarity Model Combination
Supervised Machine Learning Algorithms
A class of algorithms designed to form generalizations from “training data” in order to make predictions about previously unseen data. For example, given this training data... inflected verb citation form jumping jump singing sing burning burn ... ... ...we want to predict the citation form of an inflected verb: inflected verb citation form fishing ? carting ? soaring ?
Introduction Task Definition Contextual Similarity Model Combination
Unsupervised Machine Learning Algorithms
A class of algorithms designed to make predictions about new data without explicit training data. For example, given a large text corpus...
... John wanted to go to the park so he could jump and sing. His mom told him “Don’t forget to put on sun tan lotion to keep from burning.” So John put on sun tan lotion and went off to the park. While jumping and singing in the park, the sun was very strong, but John didn’t burn because he’d remembered to put on sun tan lotion. ...
...we want to predict the citation form of an inflected verb: inflected verb citation form
(inflection) (root)
fishing ? carting ? soaring ?
Introduction Task Definition Contextual Similarity Model Combination
Unsupervised Machine Learning Algorithms
A class of algorithms designed to make predictions about new data without explicit training data. For example, given a large text corpus...
... John wanted to go to the park so he could jump and sing. His mom told him “Don’t forget to put on sun tan lotion to keep from burning.” So John put on sun tan lotion and went off to the park. While jumping and singing in the park, the sun was very strong, but John didn’t burn because he’d remembered to put on sun tan lotion. ...
...we want to predict the citation form of an inflected verb: inflected verb citation form
(inflection) (root)
fishing ? carting ? soaring ?
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Final Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments
Introduction Task Definition Contextual Similarity Model Combination
(Representative) Previous work
Hand-built
Koskenniemi (1983): Finite-state transducers
Supervised Approaches
Rumelhart and McClelland (1986): Neural Networks Mooney and Califf (1995): Inductive Logic Programming
Unsupervised Approaches
Kazakov (1997): Bootstrapping IDL Brent (1993, 1999), Creutz (2005): Segmentation Goldsmith (2001), Snover et al (2002): Suffix learning
1
All of these approaches look for string changes, e.g. +ing
2
...have limited support of stem changes, e.g. sleep → slept
3
...and have very limited capabilities to handle highly irregular forms, e.g. be → is, was, were
Introduction Task Definition Contextual Similarity Model Combination
Prior approaches to computational morphology
“...[O]ne might pose the question, does the young language learner – who has access not only to the spoken language, but perhaps also to the rudiments of syntax and to the intended meaning of the words and sentences – does the young learner have access to additional information that simplifies the task of morpheme identification? ...I think that such a belief is very likely mistaken. Knowledge of semantics and even grammar is unlikely to make the problem of morphology discovery significantly easier.”
- J. Goldsmith, “Unsupervised Learning of the Morphology of a Natural
Language”, Computational Linguistics (27:2), 2001.
Introduction Task Definition Contextual Similarity Model Combination
Prior approaches to computational morphology
“...[O]ne might pose the question, does the young language learner – who has access not only to the spoken language, but perhaps also to the rudiments of syntax and to the intended meaning of the words and sentences – does the young learner have access to additional information that simplifies the task of morpheme identification? ...I think that such a belief is very likely mistaken. Knowledge of semantics and even grammar is unlikely to make the problem of morphology discovery significantly easier.”
- J. Goldsmith, “Unsupervised Learning of the Morphology of a Natural
Language”, Computational Linguistics (27:2), 2001. John Goldsmith’72 (Philosophy, Math, Economics)
Introduction Task Definition Contextual Similarity Model Combination
Outline
1
Introduction
2
Task Definition
3
Contextual Similarity
4
Model Combination
Introduction Task Definition Contextual Similarity Model Combination
Semantic Similarity
Motivation Most inflectional variants are semantically similar to their citation form. Many semantically related words are not morphologically related: e.g. drink, sip, guzzle, quaff, etc Many orthographically similar words are not semantically related: e.g. impact, impart, import, impose, improve, etc Though it does happen: e.g. flare, flash, flame
Introduction Task Definition Contextual Similarity Model Combination
Use (a very crude approximation of) semantics as part of an unsupervised solution to morphological analysis. We will judge a word by the company it keeps: The semantics of a word can be determined by the words with which it co-occurs. Measure the (cosine of the) angle between context vectors: Smaller angles indicate more semantic similarity than larger angles hands head faith violently himself kill away shook 128 103 21 17
- 2
shake 151 98 8 12
- shoot
- 3
56 8
- shoo
- 6
Introduction Task Definition Contextual Similarity Model Combination 1 0.5 90 75 60 45 30 15 cos(θ) θ
hands head faith violently himself kill away shook 128 103 21 17
- 2
shake 151 98 8 12
- shoot
- 3
56 8
- shoo
- 6
Introduction Task Definition Contextual Similarity Model Combination
Contextual Similarity
Visualizing this similarity (repeated-bisection clustering)
Introduction Task Definition Contextual Similarity Model Combination
Performance
We do not need (or expect) an inflection to be most contextually similar to its root. What do we expect out of the contextual clustering?
Positive evidence for correct pairings Negative evidence for incorrect pairings
Show how often the single most similar word to the inflection was the root (“Top 1”) and how often the root was
- ne of the top ten most similar words (“Top 10”)
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Final Models Unsupervised Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments
Introduction Task Definition Contextual Similarity Model Combination
Performance of Contextual Similarity
Corpus Size Language (in millions) Top 1 Top 10 Russian 34 17.93% 47.39% Estonian 43 18.87% 45.68% French 16.6 17.72% 37.41% Danish 14 13.66% 31.52% Portuguese 4.5 11.59% 30.34% English 118 11.68% 26.87% Icelandic 25 9.58% 26.13% Spanish 58 9.69% 24.01% Basque 0.7 8.08% 23.12%
Introduction Task Definition Contextual Similarity Model Combination
Performance of Contextual Similarity
Corpus Size Language (in millions) Top 1 Top 10 Polish 23 5.99% 21.99% Romanian 0.1 6.56% 19.62% Dutch 1.3 4.95% 18.77% German 19 5.09% 14.62% Finnish 0.6 3.38% 13.53% Czech 1.3 3.47% 12.36% Swedish 1.0 3.32% 12.34% Italian 46 3.52% 12.33% Swahili 0.5 2.20% 10.06%
Introduction Task Definition Contextual Similarity Model Combination
Context model sensitivity
The performance of the context model is sensitive to a number
- f parameters, including:
The size of the window, e.g. should it include 5 words, 10 words, 20 words? The position of the window, e.g. should the window be centered on the word, or should it be off-center to the right
- r left?
The words to include/exclude in the window, e.g. should we exclude punctuation? The choice of our corpus Of importance here is not that the model is sensitive to these factors... What is important is: Can we learn these parameters automatically?
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Final Models Unsupervised Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments Final Noisy
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Parameters Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Model Combination Parameters Model Initial Alignments Final Re−estimate Model
Introduction Task Definition Contextual Similarity Model Combination
Left Center Right Language 6x0 3x3 0x6 S-V-O Spanish 6.37% 20.31% 29.38% Portuguese 12.05% 26.58% 32.87% French 9.08% 38.40% 45.60% Italian 3.69% 9.99% 14.98% Romanian 10.42% 18.71% 20.86% English 13.25% 21.98% 25.67% Danish 7.21% 24.61% 34.59% Swedish 2.09% 10.36% 18.69% Icelandic 10.93% 23.43% 29.98% 1x5 Estonian 31.87% 42.20% 32.21% 44.60% Finnish 5.40% 12.09% 12.15% Tagalog 10.10% 15.08% 17.08% Swahili 8.63% 8.68% 11.02%
Introduction Task Definition Contextual Similarity Model Combination
Left Center Right Language 6x0 3x3 0x6 Free / S-V-O Czech 3.30% 11.05% 11.02% Polish 8.16% 18.91% 20.87% Russian 19.91% 40.91% 47.35% Verb Second (V2) German 9.97% 14.78% 9.96% Dutch 11.78% 16.32% 15.50% S-O-V Turkish 52.66% 44.40% 25.28% Basque 25.88% 19.75% 6.44%
Introduction Task Definition Contextual Similarity Model Combination
Outline
1
Introduction
2
Task Definition
3
Contextual Similarity
4
Model Combination
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Final Models Unsupervised Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments Final Noisy
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Parameters Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Model Combination Parameters Model Initial Alignments Final Re−estimate Model
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Parameters Model Re−estimate Model Combination Parameters Model Initial Alignments Final Noisy
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Model Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Parameters Model Initial Alignments Final Combination
Introduction Task Definition Contextual Similarity Model Combination
Iterative Retraining
Alignments Unsupervised Models Supervised Models Stand−alone Morphological Analyzer Noisy Alignments Parameters Model Re−estimate Model Combination Parameters Model Initial Final
Introduction Task Definition Contextual Similarity Model Combination
Choosing a final analysis
Iteration Language 1 2 3 4 5 English 88.5% 94.9% 95.8% 93.1% 97.5% 99.1% Portuguese 96.0% 96.7% 97.0% 97.5% 97.6% 98.2% German 93.1% 93.0% 94.4% 92.3% 93.3% 94.8% Basque 86.3% 89.9% 90.8% 90.5% 93.8% 95.1% Russian 79.7% 80.8% 81.6% 76.8% 78.9% 84.4% Estonian 81.1% 83.8% 84.9% 85.9% 86.4% 88.2% Turkish 85.7% 97.2% 97.7% 98.7% 99.1% 99.2% Columns 0-4 show the precision of the supervised model, trained using the iteratively refined unsupervised models Column 5 shows the precision of the final model, a combination of the unsupervised and supervised models
Introduction Task Definition Contextual Similarity Model Combination
How did we do?
It looks like we did pretty well, but without digging through the data to see the mistakes we’re making, it’s hard to know But we have one point of reference that might serve to show how well we are doing...
Introduction Task Definition Contextual Similarity Model Combination
How did we do?
In English and Portuguese, we do better using unsupervised methods than using the fully supervised method! In Basque and Turkish, we were pretty close, too. Final Fully Language Iteration Supervised English 99.1% 99.1% Portuguese 98.2% 97.9% German 94.8% 97.9% Basque 95.1% 96.0% Russian 84.4% 90.8% Estonian 88.2% 96.8% Turkish 99.2% 99.5%
Introduction Task Definition Contextual Similarity Model Combination
How did we do?
Notice that even the supervised learner gets a boost from the backoff to the unsupervised methods Final Fully Supervised Language Iteration Supervised with Backoff English 99.1% 99.1% 99.5% Portuguese 98.2% 97.9% 98.9% German 94.8% 97.9% 98.3% Basque 95.1% 96.0% 97.4% Russian 84.4% 90.8% 93.3% Estonian 88.2% 96.8% 98.3% Turkish 99.2% 99.5% 99.8%
Introduction Task Definition Contextual Similarity Model Combination
How did we do?
Accuracy after final iteration on different types of inflections (where classification labels were available) Semi- Obsolete/ Language All Regular Regular Irregular Other English 99.05% 99.41% 99.50% 54.84% 100.0% Portuguese 98.20% 98.31%
- 83.33%
20.00% German 94.76% 97.83% 96.31% 75.60% 96.81% Basque 95.14%
- Russian
84.42%
- Estonian
88.18%
- Turkish
99.19% 99.96% 97.33% 19.35%
Introduction Task Definition Contextual Similarity Model Combination
Recap
Motivated the need for morphology using machine translation Highlighted the lack of MT resources Described the task of morphological analysis Presented a way of approaching the problem which can be applied to any language with large (electronically available) text sources Showed the success of the approach
Introduction Task Definition Contextual Similarity Model Combination
Future Work
Revise the supervised models (extend to new phenomona) Grammatical category (part of speech) acquisition for use in morphological generation Leverage the unsupervised models for lexical choice Release the source code (so others can improve on it)
Introduction Task Definition Contextual Similarity Model Combination