KoKS* collocations with KoKS* collocations with *Korpusbasierte - - PowerPoint PPT Presentation

koks collocations with koks
SMART_READER_LITE
LIVE PREVIEW

KoKS* collocations with KoKS* collocations with *Korpusbasierte - - PowerPoint PPT Presentation

Norman Kummer, Joachim Wagner Phrase processing for detecting Phrase processing for detecting KoKS* collocations with KoKS* collocations with *Korpusbasierte Kollokationssuche (corpus based search for collocations) University of Osnabrck


slide-1
SLIDE 1

Phrase processing for detecting Phrase processing for detecting collocations with collocations with KoKS*

KoKS*

Norman Kummer, Joachim Wagner *Korpusbasierte Kollokationssuche

(corpus based search for collocations)

University of Osnabrück (Germany): KoKS-Project

slide-2
SLIDE 2

contents contents

University of Osnabrück (Germany): oKS-Project

detection of phrases

– bla

identifications of collocations evaluation (results)

slide-3
SLIDE 3

system overview system overview

University of Osnabrück (Germany): oKS-Project

slide-4
SLIDE 4

system overview system overview

University of Osnabrück (Germany): oKS-Project

slide-5
SLIDE 5

used bilingual corpora used bilingual corpora

University of Osnabrück (Germany): oKS-Project

DE-News

– radio news broadcast – translated by volunteers

EU-publications

– press releases – political documents – contracts

the four Harry Potter books

slide-6
SLIDE 6

system overview system overview

University of Osnabrück (Germany): oKS-Project

slide-7
SLIDE 7

system overview system overview

University of Osnabrück (Germany): oKS-Project

slide-8
SLIDE 8

alignment of sentences alignment of sentences 1

1/

/2

2

University of Osnabrück (Germany): oKS-Project

distance measure

– bilingual dictionaries – character trigram to identify cognats – sentence length

slide-9
SLIDE 9

alignment of sentences alignment of sentences 2

2/

/2

2

University of Osnabrück (Germany): oKS-Project

It stared back. Die Katze starrte zurück.

  • pen class words

translation found in the dictionary

bilingual dictionaries character trigram to identify cognats sentence length

slide-10
SLIDE 10

system overview system overview

University of Osnabrück (Germany): oKS-Project

slide-11
SLIDE 11

detecting phrase correspondences detecting phrase correspondences 1

1/

/5

5

University of Osnabrück (Germany): oKS-Project

POS tags sequences

– extracted from chunk-parsed monolingual corpora – distinguished by syntactic category

example:

slide-12
SLIDE 12

detecting phrase correspondences detecting phrase correspondences 2

2/

/5

5

University of Osnabrück (Germany): oKS-Project

DT NN VBZ IN NN VBD VBN RP NP VP ART NN APPART NN VVFIN APPART NN NP VP {The} school ’s {out} [party] was called {off}. {Die} [Fete] {zum} Ferienbeginn fiel {ins} Wasser. PP

slide-13
SLIDE 13

detecting phrase correspondences detecting phrase correspondences 3

3/

/5

5

University of Osnabrück (Germany): oKS-Project

POS tags sequences

– extracted from chunk-parsed monolingual corpora – distinguished by syntactic category

pair matching phrases example:

slide-14
SLIDE 14

detecting phrase correspondences detecting phrase correspondences 4

4/

/5

5

University of Osnabrück (Germany): oKS-Project

DT NN VBZ IN NN VBD VBN RP NP VP ART NN APPART NN VVFIN APPART NN NP VP {The} school ’s {out} [party] was called {off}. {Die} [Fete] {zum} Ferienbeginn fiel {ins} Wasser. PP pair pair

slide-15
SLIDE 15

detecting phrase correspondences detecting phrase correspondences 5

5/

/5

5

University of Osnabrück (Germany): oKS-Project

multiple NPs identify non-literal-phrases no word alignment is used all combinations are considered a predefined number of references is

required

slide-16
SLIDE 16

system overview system overview

University of Osnabrück (Germany): oKS-Project

slide-17
SLIDE 17

collocativity measure collocativity measure

University of Osnabrück (Germany): oKS-Project

Breidt`s definition of collocations

– compositional semantics

translation as semantics distance measure used in sentence

alignment

slide-18
SLIDE 18

results results

University of Osnabrück (Germany): oKS-Project

detecting phrase correspondences collocativity measure

slide-19
SLIDE 19

results (phrase detection) results (phrase detection) 1

1/

/3

3

University of Osnabrück (Germany): oKS-Project

so fare, we processed

– all sentences with at most 19 words – apprx. 70,000 sentence pairs

next table shows examples

– ordered by frequency (f )

slide-20
SLIDE 20

results (phrase detection) results (phrase detection) 2

2/

/3

3

University of Osnabrück (Germany): oKS-Project

rank f German English correspondence 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good

slide-21
SLIDE 21

results (phrase detection) results (phrase detection) 2

2/

/3

3

University of Osnabrück (Germany): oKS-Project

rank f German English correspondence 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good

slide-22
SLIDE 22

results (phrase detection) results (phrase detection) 2

2/

/3

3

University of Osnabrück (Germany): oKS-Project

rank f German English correspondence 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good

slide-23
SLIDE 23

results (phrase detection) results (phrase detection) 3

3/

/3

3

University of Osnabrück (Germany): oKS-Project

candidate set with f > 6

– does not contain any collocations according to Breidt (human annotators) – a lot of compositional compounds – only a few non-compositional translations

useless to apply collocativity measure

slide-24
SLIDE 24

results (collocativity measure) results (collocativity measure) 1

1/

/6

6

University of Osnabrück (Germany): oKS-Project

manually aligned phrase pairs

– 250 phrase pairs – 83 with non-compositional translation – 45 with non-compositional semantics (Breidt‘s definition of collocation) – agreement of two annotators – 31 unresolved disagreements

slide-25
SLIDE 25

results (collocativity measure) results (collocativity measure) 2

2/

/6

6

University of Osnabrück (Germany): oKS-Project

variant ignores words with high f uses length of phrases 00 no

  • nly if very different

01 no always 10 yes

  • nly if very different

11 yes always

slide-26
SLIDE 26

results (collocativity measure) results (collocativity measure) 3

3/

/6

6

University of Osnabrück (Germany): oKS-Project

precision (compositional translation) 0,00 0,10 0,20 0,30 0,40 0,50 0,60 50 100 measure 00 measure 01 measure 10 measure 11

250 candidates

slide-27
SLIDE 27

results (collocativity measure) results (collocativity measure) 4

4/

/6

6

University of Osnabrück (Germany): oKS-Project

recall (compositional translation) 0,00 0,20 0,40 0,60 0,80 1,00 50 100 measure 00 measure 01 measure 10 measure 11

250 candidates

slide-28
SLIDE 28

results (collocativity measure) results (collocativity measure) 5

5/

/6

6

University of Osnabrück (Germany): oKS-Project

precision (compositional semantics) 0,00 0,05 0,10 0,15 0,20 0,25 0,30 0,35 50 100 measure 00 measure 01 measure 10 measure 11

250 candidates

slide-29
SLIDE 29

results (collocativity measure) results (collocativity measure) 6

6/

/6

6

University of Osnabrück (Germany): oKS-Project

recall (compositional semantics) 0,00 0,20 0,40 0,60 0,80 1,00 50 100 measure 00 measure 01 measure 10 measure 11

250 candidates

slide-30
SLIDE 30
  • utlook
  • utlook

1

1/

/2

2

University of Osnabrück (Germany): oKS-Project

improve phrase correspondenses

– use proper chunking to find phrases – use word alignment – weight phrase pairs according to their correspondence probability – replace simple counts with advanced statistics (associations measure) – exploit substring relations among phrases

slide-31
SLIDE 31
  • utlook
  • utlook

2

2/

/2

2

University of Osnabrück (Germany): oKS-Project

improve collocativity measure

– decompose composita – find translation equivalences accross word classes – better combine the different parts

slide-32
SLIDE 32

discussion discussion / / questions questions / / contact contact

Link:

http://www.cl-ki.uos.de/~koks/

University of Osnabrück (Germany): oKS-Project

Norman Kummer, norman@VauDePe.de Joachim Wagner, jowagner@uos.de

University of Osnabrück Institute of Cognitive Science 49078 Osnabrück Germany

slide-33
SLIDE 33

alignment of sentences (extra 1) alignment of sentences (extra 1)

University of Osnabrück (Germany): oKS-Project

It stared back. Die Katze starrte zurück.

slide-34
SLIDE 34

alignment of sentences (extra 2) alignment of sentences (extra 2)

University of Osnabrück (Germany): oKS-Project

slide-35
SLIDE 35

system overview system overview

University of Osnabrück (Germany): oKS-Project

slide-36
SLIDE 36

application application 1

1/

/2

2

University of Osnabrück (Germany): oKS-Project

CALL-context provides help to L2 learner in

text understanding

web based interface

slide-37
SLIDE 37

current

KoKS demo appli- cation (screen- shot)

University of Osnabrück (Germany): oKS-Project

application application 2

2/

/2

2

slide-38
SLIDE 38
  • ther possible applications
  • ther possible applications

University of Osnabrück (Germany): oKS-Project

intelligent lexicon lookup (iKoKS) translation memory in CAT

(computer assisted translation)

full text search based on the lemmas

slide-39
SLIDE 39

results (phrase detection) extra slide results (phrase detection) extra slide

University of Osnabrück (Germany): oKS-Project

Phrase Alignment

0,6 0,7 0,8 0,9 1 8 16 24 32 40 48

number of references precision

Hinsichtlich der Begründung, warum wir manuell arbeiten, ist Recall eigentlich ausschlaggebend. (Keine Kollokation gefunden, Obwohl vermutlich welche vorhanden.) -> ans Ende, falls Fragen