. . . . . .
Motivation Generation Analysis
Learning Morphology from the Corpus
Ondřej Dušek
Institute of Formal and Applied Linguistics Charles University in Prague
November 11, 2013
Ondřej Dušek Learning Morphology from the Corpus 1/ 22
Learning Morphology from the Corpus Ondej Duek Institute of Formal - - PowerPoint PPT Presentation
Motivation Generation Analysis Learning Morphology from the Corpus Ondej Duek Institute of Formal and Applied Linguistics Charles University in Prague November 11, 2013 . . . . . . 1/ 22 Ondej Duek Learning Morphology from
. . . . . .
Motivation Generation Analysis
Institute of Formal and Applied Linguistics Charles University in Prague
Ondřej Dušek Learning Morphology from the Corpus 1/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 2/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 3/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 3/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 3/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 3/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 4/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 4/ 22
. . . . . .
Motivation Generation Analysis
B C
rule x y
Ondřej Dušek Learning Morphology from the Corpus 5/ 22
. . . . . .
Motivation Generation Analysis
B C
rule x y
Ondřej Dušek Learning Morphology from the Corpus 5/ 22
. . . . . .
Motivation Generation Analysis
B C
rule x y
Ondřej Dušek Learning Morphology from the Corpus 5/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 6/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 6/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 6/ 22
. . . . . .
Motivation Generation Analysis
Ondřej Dušek Learning Morphology from the Corpus 6/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
CS EN ES CA DE JA
for these languages
Ondřej Dušek Learning Morphology from the Corpus 7/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
CS EN ES CA DE JA
for these languages
Ondřej Dušek Learning Morphology from the Corpus 7/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
CS EN ES CA DE JA
for these languages
Ondřej Dušek Learning Morphology from the Corpus 7/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
[masc] [dat]
[fem] [nom] [nom]
Ondřej Dušek Learning Morphology from the Corpus 8/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
[masc] [dat]
[fem] [nom] [nom]
Ondřej Dušek Learning Morphology from the Corpus 8/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Ondřej Dušek Learning Morphology from the Corpus 9/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
[at the end] [delete one letter] [and add these]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[5 letters from the end] [delete one letter] [and add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[replace the whole word]
[5 letters from the end] [delete one letter] [and add this]
Ondřej Dušek Learning Morphology from the Corpus 10/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
[at the end] [delete one letter] [and add these]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[5 letters from the end] [delete one letter] [and add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[replace the whole word]
[5 letters from the end] [delete one letter] [and add this]
Ondřej Dušek Learning Morphology from the Corpus 10/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
[at the end] [delete one letter] [and add these]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[5 letters from the end] [delete one letter] [and add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[replace the whole word]
[5 letters from the end] [delete one letter] [and add this]
Ondřej Dušek Learning Morphology from the Corpus 10/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
[at the end] [delete one letter] [and add these]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[5 letters from the end] [delete one letter] [and add this]
[at the end] [delete one letter] [and add these]
[at the beginning] [add this]
[replace the whole word]
[5 letters from the end] [delete one letter] [and add this]
Ondřej Dušek Learning Morphology from the Corpus 10/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Ondřej Dušek Learning Morphology from the Corpus 11/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Ondřej Dušek Learning Morphology from the Corpus 11/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Ondřej Dušek Learning Morphology from the Corpus 11/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Pl Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Ondřej Dušek Learning Morphology from the Corpus 12/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Pl Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Ondřej Dušek Learning Morphology from the Corpus 12/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Pl Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Ondřej Dušek Learning Morphology from the Corpus 12/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Pl Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Pl
Neut Dat
Ondřej Dušek Learning Morphology from the Corpus 12/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
English German Czech 92 94 96 98
Unseen forms
accuracy (%)
90 100 Total
CS EN ES CA DE JA
Ondřej Dušek Learning Morphology from the Corpus 13/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
English German Czech 92 94 96 98
Unseen forms
accuracy (%)
90 100 Total
CS EN ES CA DE JA
Ondřej Dušek Learning Morphology from the Corpus 13/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
English German Czech 92 94 96 98
Unseen forms
accuracy (%)
90 100 Total
CS EN ES CA DE JA
Ondřej Dušek Learning Morphology from the Corpus 13/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
English German Czech 92 94 96 98
Unseen forms
accuracy (%)
90 100 Total
CS EN ES CA DE JA
Ondřej Dušek Learning Morphology from the Corpus 13/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
0,1 0,5 1 5 10 20 30 50 75 100
75
80 85 90 95 accuracy (%) training data part (%)
58% error reduction 76% error reduction Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms) 100
0,1 0,5 1 5 10 20 30 50 75 100 50 60 70 80 90
100
accuracy(%) training data part (%)
92% error reduction 40 Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms)
0,1 0,5 1 5 10 20 30 50 75 100 50 60 70 80 90
100
accuracy(%) training data part (%)
92% error reduction 40 Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms)
Dict Hajič Flect 92.88 98.25 99.45
Ondřej Dušek Learning Morphology from the Corpus 14/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
0,1 0,5 1 5 10 20 30 50 75 100
75
80 85 90 95 accuracy (%) training data part (%)
58% error reduction 76% error reduction Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms) 100
0,1 0,5 1 5 10 20 30 50 75 100 50 60 70 80 90
100
accuracy(%) training data part (%)
92% error reduction 40 Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms)
0,1 0,5 1 5 10 20 30 50 75 100 50 60 70 80 90
100
accuracy(%) training data part (%)
92% error reduction 40 Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms)
Dict Hajič Flect 92.88 98.25 99.45
Ondřej Dušek Learning Morphology from the Corpus 14/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
0,1 0,5 1 5 10 20 30 50 75 100
75
80 85 90 95 accuracy (%) training data part (%)
58% error reduction 76% error reduction Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms) 100
0,1 0,5 1 5 10 20 30 50 75 100 50 60 70 80 90
100
accuracy(%) training data part (%)
92% error reduction 40 Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms)
0,1 0,5 1 5 10 20 30 50 75 100 50 60 70 80 90
100
accuracy(%) training data part (%)
92% error reduction 40 Dictionary (Total) Dictionary (Unknown forms) Flect (Total) Flect (Unknown forms)
Dict Hajič Flect 92.88 98.25 99.45
Ondřej Dušek Learning Morphology from the Corpus 14/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Ondřej Dušek Learning Morphology from the Corpus 15/ 22
. . . . . .
Motivation Generation Analysis Introduction The system Results
Ondřej Dušek Learning Morphology from the Corpus 15/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 16/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 16/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 16/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 16/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 17/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 17/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 17/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
Ondřej Dušek Learning Morphology from the Corpus 17/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
[replace ending] [remove beginning]
Ondřej Dušek Learning Morphology from the Corpus 18/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
[replace ending] [remove beginning]
Ondřej Dušek Learning Morphology from the Corpus 18/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
... "ebí": {"|NNNS1-----A----", "|NNNS6-----A----", ">1-it|VB-S---3P-AA---", ">1-it|VB-P---3P-AA---", "|Db-------------" }, ...
Ondřej Dušek Learning Morphology from the Corpus 19/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
... "ebí": {"|NNNS1-----A----", "|NNNS6-----A----", ">1-it|VB-S---3P-AA---", ">1-it|VB-P---3P-AA---", "|Db-------------" }, ...
Ondřej Dušek Learning Morphology from the Corpus 19/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
... "ebí": {"|NNNS1-----A----", "|NNNS6-----A----", ">1-it|VB-S---3P-AA---", ">1-it|VB-P---3P-AA---", "|Db-------------" }, ...
Ondřej Dušek Learning Morphology from the Corpus 19/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
cov (%) ø sugg. Hajič (060406) 98.82 3.85 Hajič (060406) + guesser 99.35 4.06 Hajič (131023) 98.52 4.00 Hajič (131023) + guesser 99.01 4.18 Memo-Suffixes (len 4) 98.71 5.69 Memo-Suffixes (len 3) 99.30 11.83 Memo-Suffixes (len 4, thr 2) 98.07 4.75 Memo-Suffixes (len 3, thr 2) 98.91 9.27
Ondřej Dušek Learning Morphology from the Corpus 20/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
cov (%) ø sugg. Hajič (060406) 98.82 3.85 Hajič (060406) + guesser 99.35 4.06 Hajič (131023) 98.52 4.00 Hajič (131023) + guesser 99.01 4.18 Memo-Suffixes (len 4) 98.71 5.69 Memo-Suffixes (len 3) 99.30 11.83 Memo-Suffixes (len 4, thr 2) 98.07 4.75 Memo-Suffixes (len 3, thr 2) 98.91 9.27
Ondřej Dušek Learning Morphology from the Corpus 20/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
analysis tagger tag lemma joint Hajič (060406) Featurama 95.38 99.27 95.29 Hajič (060406) + guesser 95.77 99.31 95.64 Hajič (131023) 95.15 99.13 94.95 Hajič (131023) + guesser 95.49 99.18 95.26 Milan Straka's tagger beta (131023) 94.72 99.13 94.53 Milan Straka's tagger beta (131023) + guesser 95.07 99.15 94.85 Morfette (trained on tamw only) 89.79 97.65 89.39 Memo-Suffixes (len 4) Featurama 94.12 97.80 93.34 Memo-Suffixes (len 3) 94.28 96.84 92.59 Memo-Suffixes (len 4, thr 2) 93.64 97.86 93.09 Memo-Suffixes (len 3, thr 2)
Ondřej Dušek Learning Morphology from the Corpus 21/ 22
. . . . . .
Motivation Generation Analysis Introduction Experiments Results
analysis tagger tag lemma joint Hajič (060406) Featurama 95.38 99.27 95.29 Hajič (060406) + guesser 95.77 99.31 95.64 Hajič (131023) 95.15 99.13 94.95 Hajič (131023) + guesser 95.49 99.18 95.26 Milan Straka's tagger beta (131023) 94.72 99.13 94.53 Milan Straka's tagger beta (131023) + guesser 95.07 99.15 94.85 Morfette (trained on tamw only) 89.79 97.65 89.39 Memo-Suffixes (len 4) Featurama 94.12 97.80 93.34 Memo-Suffixes (len 3) 94.28 96.84 92.59 Memo-Suffixes (len 4, thr 2) 93.64 97.86 93.09 Memo-Suffixes (len 3, thr 2)
Ondřej Dušek Learning Morphology from the Corpus 21/ 22
. . . . . .
Motivation Generation Analysis
Bohnet, B. et al. (2010). Broad coverage multilingual deep sentence generation with a stochastic multi-level realizer. COLING Chrupała, G. et al. (2008). Learning morphology with Morfette. LREC Hajič, J. (2004). Disambiguation of rich inflection: Computational morphology of Czech. Karolinum.
Ondřej Dušek Learning Morphology from the Corpus 22/ 22