koks collocations with koks
play

KoKS* collocations with KoKS* collocations with *Korpusbasierte - PowerPoint PPT Presentation

Norman Kummer, Joachim Wagner Phrase processing for detecting Phrase processing for detecting KoKS* collocations with KoKS* collocations with *Korpusbasierte Kollokationssuche (corpus based search for collocations) University of Osnabrck


  1. Norman Kummer, Joachim Wagner Phrase processing for detecting Phrase processing for detecting KoKS* collocations with KoKS* collocations with *Korpusbasierte Kollokationssuche (corpus based search for collocations) University of Osnabrück (Germany): KoKS-Project

  2. contents contents � detection of phrases – bla � identifications of collocations � evaluation (results) University of Osnabrück (Germany): oKS-Project

  3. system overview system overview �������� ��������� ������������ ��������� ������ ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project

  4. system overview system overview �������� ��������� ������������ ��������� ������ ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project

  5. used bilingual corpora used bilingual corpora � DE-News – radio news broadcast – translated by volunteers � EU-publications – press releases – political documents – contracts � the four Harry Potter books University of Osnabrück (Germany): oKS-Project

  6. system overview system overview �������� ��������� ������������ ��������� ������ ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project

  7. system overview system overview �������� ���� ��������� ������� ������������ ��������� ������ ��������� ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project

  8. 1 / alignment of sentences 1 / 2 alignment of sentences 2 � distance measure – bilingual dictionaries – character trigram to identify cognats – sentence length University of Osnabrück (Germany): oKS-Project

  9. 2 / alignment of sentences 2 / 2 alignment of sentences 2 It stared back. translation found in the dictionary Die Katze starrte zurück. open class words bilingual dictionaries character trigram to identify cognats sentence length University of Osnabrück (Germany): oKS-Project

  10. system overview system overview �������� ���� ��������� ������� ������������ ��������� ������ ��������� ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project

  11. 1 / detecting phrase correspondences 1 / 5 detecting phrase correspondences 5 � POS tags sequences – extracted from chunk-parsed monolingual corpora – distinguished by syntactic category � example: University of Osnabrück (Germany): oKS-Project

  12. 2 / detecting phrase correspondences 2 / 5 detecting phrase correspondences 5 DT NN VBZ IN NN VBD VBN RP {The} school ’s {out} [party] was called {off}. NP VP ART NN APPART NN VVFIN APPART NN {Die} [Fete] {zum} Ferienbeginn fiel {ins} Wasser. NP PP VP University of Osnabrück (Germany): oKS-Project

  13. 3 / detecting phrase correspondences 3 / 5 detecting phrase correspondences 5 � POS tags sequences – extracted from chunk-parsed monolingual corpora – distinguished by syntactic category � pair matching phrases � example: University of Osnabrück (Germany): oKS-Project

  14. 4 / detecting phrase correspondences 4 / 5 detecting phrase correspondences 5 DT NN VBZ IN NN VBD VBN RP {The} school ’s {out} [party] was called {off}. NP VP pair pair ART NN APPART NN VVFIN APPART NN {Die} [Fete] {zum} Ferienbeginn fiel {ins} Wasser. NP PP VP University of Osnabrück (Germany): oKS-Project

  15. 5 / detecting phrase correspondences 5 / 5 detecting phrase correspondences 5 � multiple NPs � identify non-literal-phrases � no word alignment is used � all combinations are considered � a predefined number of references is required University of Osnabrück (Germany): oKS-Project

  16. system overview system overview �������� ���� ��������� ������� ������������ ��������� ������ ��������� ��������������� �������������� ���� �� �������� ������������ University of Osnabrück (Germany): oKS-Project

  17. collocativity measure collocativity measure � Breidt`s definition of collocations – compositional semantics � translation as semantics � distance measure used in sentence alignment University of Osnabrück (Germany): oKS-Project

  18. results results � detecting phrase correspondences � collocativity measure University of Osnabrück (Germany): oKS-Project

  19. 1 / 1 results (phrase detection) / 3 results (phrase detection) 3 � so fare, we processed – all sentences with at most 19 words – apprx. 70,000 sentence pairs � next table shows examples – ordered by frequency ( f ) University of Osnabrück (Germany): oKS-Project

  20. 2 / 2 results (phrase detection) / 3 results (phrase detection) 3 f German English correspondence rank 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good University of Osnabrück (Germany): oKS-Project

  21. 2 / 2 results (phrase detection) / 3 results (phrase detection) 3 f German English correspondence rank 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good University of Osnabrück (Germany): oKS-Project

  22. 2 / 2 results (phrase detection) / 3 results (phrase detection) 3 f German English correspondence rank 22 30 Professor Dumbledore bad 23 30 die Tür (the door) Harry bad 24 29 Professor Professor Lupin near 25 29 Schloss the castle good ... ... 33 25 zu Harry to Harry good 34 24 will do n't want near 35 24 schien seemed to be good 36 24 ist do n't know bad 37 24 sagte (said) 've got bad 38 23 Dementoren the dementors good 39 22 Kammer the Chamber good University of Osnabrück (Germany): oKS-Project

  23. 3 / 3 results (phrase detection) / 3 results (phrase detection) 3 � candidate set with f > 6 – does not contain any collocations according to Breidt (human annotators) – a lot of compositional compounds – only a few non-compositional translations � useless to apply collocativity measure University of Osnabrück (Germany): oKS-Project

  24. 1 / 1 results (collocativity measure) / 6 results (collocativity measure) 6 � manually aligned phrase pairs – 250 phrase pairs – 83 with non-compositional translation – 45 with non-compositional semantics (Breidt‘s definition of collocation) – agreement of two annotators – 31 unresolved disagreements University of Osnabrück (Germany): oKS-Project

  25. 2 / 2 results (collocativity measure) / 6 results (collocativity measure) 6 ignores words uses length of with high f variant phrases no only if very different 00 no always 01 yes only if very different 10 11 yes always University of Osnabrück (Germany): oKS-Project

  26. 3 / 3 results (collocativity measure) / 6 results (collocativity measure) 6 precision (compositional translation) 0,60 0,50 measure 00 0,40 measure 01 0,30 measure 10 0,20 measure 11 0,10 � 250 candidates 0,00 0 50 100 University of Osnabrück (Germany): oKS-Project

  27. 4 / 4 results (collocativity measure) / 6 results (collocativity measure) 6 recall (compositional translation) 1,00 0,80 measure 00 0,60 measure 01 measure 10 0,40 measure 11 0,20 � 250 candidates 0,00 0 50 100 University of Osnabrück (Germany): oKS-Project

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend