Overview of Morpho Challenge task at CLEF 2009 Mikko Kurimo, Sami - - PowerPoint PPT Presentation

overview of morpho challenge task at clef 2009
SMART_READER_LITE
LIVE PREVIEW

Overview of Morpho Challenge task at CLEF 2009 Mikko Kurimo, Sami - - PowerPoint PPT Presentation

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE Overview of Morpho Challenge task at CLEF 2009 Mikko Kurimo, Sami Virpioja, Ville Turunen Helsinki University of Technology (TKK) DEPARTMENT OF INFORMATION


slide-1
SLIDE 1

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Overview of Morpho Challenge task at CLEF 2009

Mikko Kurimo, Sami Virpioja, Ville Turunen Helsinki University of Technology (TKK)

slide-2
SLIDE 2

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Goals of the project

  • Design statistical machine learning algorithms that

discover which morphemes words consist of

  • Find morphemes that are useful as vocabulary units

for statistical language modeling in: Speech recognition, Machine translation, Information retrieval

  • Discover approaches suitable for a wide range of

languages and tasks

slide-3
SLIDE 3

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Morpho Challenge summary

  • Part of the EU Network of Excellence PASCAL
  • Organized in collaboration with CLEF
  • Participation is open to all and free of charge
  • Data provided in: Finnish, English, German, Turkish and Arabic
  • Task: Implement an unsupervised algorithm that discovers

morpheme analysis of words in each language!

  • Results: Evaluations in IR and SMT
  • Workshop: Corfu, Greece, September 30, 2009
slide-4
SLIDE 4

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

  • ASR, IR and SMT

require a large vocabulary

  • Morphologically

rich languages suffer from a severe vocabulary explosion

  • More efficient

representation units needed

The vocabulary problem

slide-5
SLIDE 5

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Agglutinative morphology

  • Finnish words typically consist of lengthy sequences of

morphemes — stems, suffixes (and sometimes prefixes): – kahvi + n + juo + ja + lle + kin (coffee + of + drink + - er + for + also = ’also for [the] coffee drinker’) – nyky + ratkaisu + i + sta + mme (current + solution + -s + from + our = ’from our current solutions’) – tietä + isi + mme + kö + hän (know + would + we + INTERR + indeed = ’would we really know?’) – tietä + vä + mmä + lle (know + -ing + COMP + for = ’for the more knowing’ = ’for the one who knows more’)

slide-6
SLIDE 6

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Morfessor algorithm at TKK 2002

  • Automatic segmentation of words into morphemes
  • A fully data-driven unsupervised machine learning algorithm
  • Discovers a compact representation of the input text corpus
  • MAP optimization where the result resembles linguistic

morphemes: left + hand + ed, hand + ful

  • Language independent, no morphological rules or annotated

data needed

  • Toolkit available at http://www.cis.hut.fi/projects/morpho/

[PhD thesis of M.Creutz (2006)]

slide-7
SLIDE 7

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Morpho Challenge since 2005

  • Evaluation languages:

– 2005: Finnish, Turkish, English – 2007: + German – 2008 - 2009: + Arabic

  • Evaluation tasks:

– 2005: linguistic & speech recognition (ASR) – 2007-2008: linguistic & information retrieval (IR) – 2009: + machine translation (SMT)

slide-8
SLIDE 8

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

History of Morpho Challenge

  • Participating groups:

– 2005: 6 (+ 5 students groups) – 2007: 6 – 2008: 6 – 2009: 10

  • Type of submission:

– 2005: words split into smaller units – 2007-2009: full morpheme analysis of words

slide-9
SLIDE 9

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Plan of 2009 Challenge

  • The participants submit their morpheme analyses
  • The organizers evaluate them in various ways:

1.Comparison to a linguistic morpheme "gold standard“ 2.Information retrieval experiments, where the indexing is based on morphemes instead of entire words 3.Machine translation experiments, where the translation is based on morphemes

slide-10
SLIDE 10

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

slide-11
SLIDE 11

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

slide-12
SLIDE 12

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

slide-13
SLIDE 13

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Plan of 2009 Challenge

  • The participants submit their morpheme analyses
  • The organizers evaluate them in various ways:

1.Comparison to a linguistic morpheme "gold standard“ 2.Information retrieval experiments, where the indexing is based on morphemes instead of entire words 3.Machine translation experiments, where the translation is based on morphemes

slide-14
SLIDE 14

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Information Retrieval evaluation 2009

  • English, German and Finnish tasks
  • Words in the documents and queries were

replaced by the suggested segmentations

  • If no segmentation was provided, the word was

left unsegmented

slide-15
SLIDE 15

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Example

  • Query:

Französische Atomtests

  • Doc 1:

Ein zweiter französischer

Atomtest fand mit 15-20 kt Sprengkraft...

  • Doc 2:

Heim ist nicht automatisch ein gutes Heim...

slide-16
SLIDE 16

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Example: Method A

  • Query:

französisch +e atom test +s

  • Doc 1:

ein zwei +t +er französisch +er

atom test fand mit 15-20 kt spreng kraft...

  • Doc 2:

heim ist nicht automat isch ein gut +es heim...

slide-17
SLIDE 17

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Example: Method B

  • Query:

fran zö sische a tom tes ts

  • Doc 1:

ein z weiter fran zö sischer a tom test fand mit 15–20 kt spr eng kraf t...

  • Doc 2: heim ist nicht au tom a tisch

ein gu tes heim...

slide-18
SLIDE 18

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Setup

  • LEMUR-toolkit: http://www.lemurproject.org/
  • Okapi BM25 ranking
  • Stoplist for the most common morphemes

– a fixed threshold for corpus frequency

  • Evaluation metric is Mean Average Precision

(MAP)

slide-19
SLIDE 19

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

IR data sets (same as in 2007-2008)

  • Finnish (CLEF 2004)

– 55K documents from articles in Aamulehti 1994-95 – 50 test queries, 23K binary relevance assessments

  • English (CLEF 2005)

– 107K documents from articles in Los Angeles Times 1994 and Glasgow Herald 1995 – 50 test queries, 20K binary relevance assessments

  • German (CLEF 2003)

– 300K documents from short articles in Frankfurter Rundschau 1994, Der Spiegel 1994-95 and SDA German 1994-95 – 60 test queries, 23K binary relevance assessments

slide-20
SLIDE 20

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Reference methods

  • Morfessor Baseline: our public code since 2002
  • Morfessor Categories-MAP: improved, public 2006
  • dummy: no segmentation, all words unsplit
  • grammatical: full gold standard segmentation

– all: all alternative segmentations included – first: only the first alternative chosen

  • TWOL: word normalization by a commercial rule-based

morphological analyzer (all & first)

  • Snowball: Language specific stemming
slide-21
SLIDE 21

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

0.2 0.25 0.3 0.35 0.4 [Lignos et al.]* [Virpioja & Kohonen] Allomor- fessor [Monson et al.] ParaMor Mimic [Monson et al.] ParaMor-Mor- fessor Union [Lavellée & Langlais] RALI-ANA* [Monson et al.] ParaMor-Mor- fessor Mimic [Tchoukalov et al.] MetaMorph* [Lavellée & Langlais] RALI-COF* [Bernhard] MorphoNet [Golénia et al.] UNGRADE* [Can & Manandhar]* [Spiegler et al.] PROMODES* [Spiegler et al.] PROMODES 2* [Spiegler et al.] PROMODES committee* snowball porter Best2008 (Monson Paramor+Morfessor) TWOL first TWOL all Morfessor Baseline grammatical first Morfessor CatMAP grammatical all dummy

English results

Reference methods

slide-22
SLIDE 22

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

0.2 0.25 0.3 0.35 0.4 [Lignos et al.]* [Virpioja & Kohonen] Allomor- fessor [Monson et al.] ParaMor Mimic [Monson et al.] ParaMor-Mor- fessor Union [Lavellée & Langlais] RALI-ANA* [Monson et al.] ParaMor-Mor- fessor Mimic [Tchoukalov et al.] MetaMorph* [Lavellée & Langlais] RALI-COF* [Bernhard] MorphoNet [Golénia et al.] UNGRADE* [Can & Manandhar]* [Spiegler et al.] PROMODES* [Spiegler et al.] PROMODES 2* [Spiegler et al.] PROMODES committee* snowball porter Best2008 (Monson Paramor+Morfessor) TWOL first TWOL all Morfessor Baseline grammatical first Morfessor CatMAP grammatical all dummy

English results

Reference methods No significant difference to the best above this line

slide-23
SLIDE 23

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

0.2 0.25 0.3 0.35 0.4 0.45 [Monson et al.] ParaMor-Morfessor Mimic [Monson et al.] ParaMor-Morfessor Union [Virpioja & Kohonen] Allomorfessor [Can & Manandhar] 1* [Lavellée & Langlais] RALI-COF* [Can & Manandhar] 2* [Lignos et al.]* [Monson et al.] ParaMor Mimic [Tchoukalov et al.] MetaMorph* [Spiegler et al.] PROMODES commit- tee* [Golénia et al.] UNGRADE* [Spiegler et al.] PROMODES* [Lavellée & Langlais] RALI-ANA* [Bernhard] MorphoNet [Spiegler et al.] PROMODES 2* TWOL first TWOL all Best2008 (Monson Paramor+Morfessor) Morfessor Baseline Morfessor CatMAP snowball german dummy grammatical first grammatical all

German results

Reference methods

slide-24
SLIDE 24

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

0.2 0.25 0.3 0.35 0.4 0.45 [Monson et al.] ParaMor-Morfessor Mimic [Monson et al.] ParaMor-Morfessor Union [Virpioja & Kohonen] Allomorfessor [Can & Manandhar] 1* [Lavellée & Langlais] RALI-COF* [Can & Manandhar] 2* [Lignos et al.]* [Monson et al.] ParaMor Mimic [Tchoukalov et al.] MetaMorph* [Spiegler et al.] PROMODES commit- tee* [Golénia et al.] UNGRADE* [Spiegler et al.] PROMODES* [Lavellée & Langlais] RALI-ANA* [Bernhard] MorphoNet [Spiegler et al.] PROMODES 2* TWOL first TWOL all Best2008 (Monson Paramor+Morfessor) Morfessor Baseline Morfessor CatMAP snowball german dummy grammatical first grammatical all

German results

Reference methods

slide-25
SLIDE 25

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

0.2 0.25 0.3 0.35 0.4 0.45 [Monson et al.] ParaMor-Mor- fessor Union [Virpioja & Kohonen] Allomor- fessor [Monson et al.] ParaMor-Mor- fessor Mimic [Spiegler et al.] PROMODES 2* [Monson et al.] ParaMor Mimic [Lavellée & Langlais] RALI- COF* [Bernhard] MorphoNet [Golénia et al.] UNGRADE* [Lavellée & Langlais] RALI- ANA* [Spiegler et al.] PROMODES committee* [Spiegler et al.] PROMODES* [Tchoukalov et al.] MetaMorph* TWOL first Best2008 (McNamee four) TWOL all Morfessor CatMAP Morfessor Baseline grammatical first snowball finnish grammatical all dummy

Finnish results

Reference methods

slide-26
SLIDE 26

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

0.2 0.25 0.3 0.35 0.4 0.45 [Monson et al.] ParaMor-Mor- fessor Union [Virpioja & Kohonen] Allomor- fessor [Monson et al.] ParaMor-Mor- fessor Mimic [Spiegler et al.] PROMODES 2* [Monson et al.] ParaMor Mimic [Lavellée & Langlais] RALI- COF* [Bernhard] MorphoNet [Golénia et al.] UNGRADE* [Lavellée & Langlais] RALI- ANA* [Spiegler et al.] PROMODES committee* [Spiegler et al.] PROMODES* [Tchoukalov et al.] MetaMorph* TWOL first Best2008 (McNamee four) TWOL all Morfessor CatMAP Morfessor Baseline grammatical first snowball finnish grammatical all dummy

Finnish results

Reference methods

slide-27
SLIDE 27

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Discussion of the IR tasks

  • Results not improved from last year
  • Hard to achieve statistically significant differences
  • No clear winner
  • Strong in all languages:

– “ParaMor-Morfessor Union” & ”Mimic” – ”Allomorfessor”

  • Full word list not submitted by all participants

– Comparison bit more difficult

slide-28
SLIDE 28

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Conclusions

  • IR evaluations for 3 languages (out of 5)
  • Good results in all languages by several

algorithms

=> Unsupervised morphological analysis is a viable approach for IR

  • Full report and papers in the CLEF proceedings
  • Details, presentations, links, info at:

http://www.cis.hut.fi/morphochallenge2009/

slide-29
SLIDE 29

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Future directions

  • New languages: Russian, Indian languages,...
  • New tasks: QA, speech synthesis...
  • New workshops: Venice, Budapest, Aarhus, Corfu, ...
  • New supporters: PASCAL, CLEF, EMIME, ...
  • New and improved learning algorithms
  • New participants, new application areas:
  • Next workshop within ACL, NAACL or MLSP
slide-30
SLIDE 30

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

More info of Morpho Challenge

  • Data, references, previous results:
  • http://www.cis.hut.fi/morphochallenge2009/
  • Email Mikko.Kurimo @ tkk.fi to join the mailing list
  • Information of the Morpho Challenge 2010 will become

available within the next two months

slide-31
SLIDE 31

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Thanks

Thanks to all who made Morpho Challenge 2009 possible:

  • PASCAL network, CLEF, Leipzig corpora collection,
  • Univ. Leeds, Univ. Haifa
  • Gold standard providers: Majdi Sawalha, Eric Atwell,

Ebru Arisoy, Stefan Bordag and Mathias Creutz

  • Morpho Challenge organizing committee, program

committee and evaluation team

  • Morpho Challenge participants
  • CLEF 2009 workshop organizers, especially Carol !
slide-32
SLIDE 32

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Unsupervised Morpheme Analysis Competition 3: Statistical Machine Translation

Mikko Kurimo, Sami Virpioja, Ville T. Turunen (TKK) Graeme W. Blackwood, William Byrne (UCAM)

slide-33
SLIDE 33

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Morphology and SMT

  • Statistical machine translation systems find

translation probabilities between words or sequences of words (“phrases”).

  • Languages of rich morphology tend to be hard to

translate both from and to – e.g. Finnish is one of the hardest among the EU languages.

  • Still unsolved problem
slide-34
SLIDE 34

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Morph-based translation

  • Can unsupervised morphology learning directly

improve SMT?

– Reduces out-of-vocabulary rates

(S. Virpioja, J. Väyrynen, M. Creutz & M. Sadeniemi, Morphology- aware statistical machine translation based on morphs induced in an unsupervised manner, MT Summit XI, 2007)

– Improves translation results

(A. de Gispert, S. Virpioja, W. Byrne, M. Kurimo, Minimum bayes risk combination of translation hypotheses from alternative morphological decompositions, HLT-NAACL, 2009)

slide-35
SLIDE 35

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Tasks and data

  • Europarl parallel corpus

– Proceedings of the EU parliament meetings in 11 European languages

  • { Finnish, German } → English

– Reducing OOV problems at the source side – Finnish: 479 780 word types – German: 270 038 word types

  • ~1 million sentences for training,

<3000 for tuning, 3000 for testing

slide-36
SLIDE 36

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

System overview

  • Evaluation based on combination of word-based and

morph-based SMT systems (de Gispert et al., 2009)

slide-37
SLIDE 37

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Phrase-based SMT

  • One of the major advances in SMT methodology in this

decade

  • Open source software: Moses

(P. Koehn et al., 2007)

  • Main steps in building a system with Moses:

– Word alignment (Giza++) – Phrase extraction and scoring – Building additional models (language model, reordering model, etc.) – Parameter tuning for decoder

slide-38
SLIDE 38

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

MBR and system combination

  • Minimum Bayes Risk (MBR) decoding:

– Select translation hypothesis which maximises the conditional expected gain:

  • System combination: generate N-best lists from

different systems and find the best hypothesis with the MBR criterion  E=argmax

 E∈e ∑ E∈e

GE ,  E PE∣F 

slide-39
SLIDE 39

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

MT evaluation

  • There are several metrics for automatic

evaluation of MT systems.

  • BLEU score is based on co-occurrence of

n-grams (n=1...4) in the proposed translation and the reference translation(s).

  • Usually consistent with human evaluations if the

evaluated systems are similar

slide-40
SLIDE 40

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Submissions to Competition 3

  • Bernhard – MorphoNet (MN)
  • Monson et al. - ParaMor Mimic (PM)
  • Monson et al. - ParaMor Morfessor Mimic (PMM)
  • Monson et al. - ParaMor Morfessor Union (PMU)
  • Virpioja & Kohonen – Allomorfessor (A)
  • Tchoukalov et al. - MetaMorph (MM)
  • Reference methods: Morfessor Baseline (MB), Morfessor

CatMAP (MC), Grammatical (G)

slide-41
SLIDE 41

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Example translations (1)

Words Grammatical gold standard

slide-42
SLIDE 42

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Example translations (2)

Bernhard - MorphoNet Monson et al. - ParaMor-Morfessor Union

slide-43
SLIDE 43

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Example translations (3)

Tchoukalov et al. - MetaMorph Virpioja & Kohonen - Allomorfessor

slide-44
SLIDE 44

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Results: Finnish

slide-45
SLIDE 45

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Results: German

slide-46
SLIDE 46

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Discussion

  • Too long (>100 tokens) sentences cannot be

handled by Giza++.

– Segmentation decreases the amount of training data. – Direct effect on performance

  • However, the number of average morphs per

word does not explain the number of pruned sentences.

slide-47
SLIDE 47

DEPARTMENT OF INFORMATION AND COMPUTER SCIENCE ADAPTIVE INFORMATICS RESEARCH CENTRE

Conclusions

  • 6 submitted and 3 reference methods were tested on two

machine translation tasks.

  • The 3-5 best methods improved the translation results
  • ver the baseline word-based system.
  • Some improvements are needed to make the comparison

more fair.

  • Full report and papers in the CLEF proceedings
  • Details, presentations, links, info at:

http://www.cis.hut.fi/morphochallenge2009/