syntax based machine translation using multi bottom up
play

Syntax-based Machine Translation using Multi Bottom-up Tree - PowerPoint PPT Presentation

Syntax-based Machine Translation using Multi Bottom-up Tree Transducers Andreas Maletti Fabienne Braune, Daniel Quernheim, Nina Seemann Institute for Natural Language Processing Universitt Stuttgart, Germany maletti@ims.uni-stuttgart.de


  1. Syntax-based Machine Translation using Multi Bottom-up Tree Transducers Andreas Maletti Fabienne Braune, Daniel Quernheim, Nina Seemann Institute for Natural Language Processing Universität Stuttgart, Germany maletti@ims.uni-stuttgart.de Uppsala — November 8, 2012 Syntax-based MT using MBOT A. Maletti 1 ·

  2. Overview Motivation 1 Extended Multi Bottom-up Tree Transducers 2 The Theory 3 The Application 4 Syntax-based MT using MBOT A. Maletti 2 ·

  3. Motivation Machine translation Translation Input: Official forecasts predicted just 3 percent, Bloomberg said. Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] Our MBOT translator (untuned): offiziellen prognosen vorausgesagt nur 3 % bloomberg habe. [official] [forecasts] [*predicted] [only] [3 %] [Bloomberg] [*has] Google Translate ( translate.google.com ): Offizielle Prognosen vorausgesagt nur 3 Prozent, sagte Bloomberg. [official] [forecasts] [*predicted] [only] [3 percent] [said] [Bloomberg] Syntax-based MT using MBOT A. Maletti 3 ·

  4. Motivation Machine translation Translation Input: Official forecasts predicted just 3 percent, Bloomberg said. Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] Our MBOT translator (untuned): offiziellen prognosen vorausgesagt nur 3 % bloomberg habe. [official] [forecasts] [*predicted] [only] [3 %] [Bloomberg] [*has] Google Translate ( translate.google.com ): Offizielle Prognosen vorausgesagt nur 3 Prozent, sagte Bloomberg. [official] [forecasts] [*predicted] [only] [3 percent] [said] [Bloomberg] Syntax-based MT using MBOT A. Maletti 3 ·

  5. Motivation Machine translation Translation Input: Official forecasts predicted just 3 percent, Bloomberg said. Reference: Offizielle Prognosen sind von nur 3 Prozent ausgegangen, meldete Bloomberg. [official] [forecasts] [are] [of] [only] [3 percent] [assumed] [reported] [Bloomberg] Our MBOT translator (untuned): offiziellen prognosen vorausgesagt nur 3 % bloomberg habe. [official] [forecasts] [*predicted] [only] [3 %] [Bloomberg] [*has] Google Translate ( translate.google.com ): Offizielle Prognosen vorausgesagt nur 3 Prozent, sagte Bloomberg. [official] [forecasts] [*predicted] [only] [3 percent] [said] [Bloomberg] Syntax-based MT using MBOT A. Maletti 3 ·

  6. Motivation Machine translation Translation Input: The ECB wants to hold inflation to under two percent, or somewhere in that vicinity. Reference: Die EZB ist bestrebt, die Inflationsrate unter zwei Prozent, [the] [ECB] [is] [desire] [the] [inflation rate] [below] [two percent] oder zumindest knapp an der Zwei-Prozent-Marke zu halten. [or] [at least] [close] [at] [the] [two percent mark] [to keep] Google Translate ( translate.google.com ): Die EZB will die Inflation zu halten unter zwei Prozent, [the] [ECB] [wants] [the] [inflation] [*to keep] [below] [two percent] oder irgendwo in der Nähe. [or] [somewhere] [in] [the] [vicinity] Syntax-based MT using MBOT A. Maletti 4 ·

  7. Motivation Machine translation Translation Input: The ECB wants to hold inflation to under two percent, or somewhere in that vicinity. Reference: Die EZB ist bestrebt, die Inflationsrate unter zwei Prozent, [the] [ECB] [is] [desire] [the] [inflation rate] [below] [two percent] oder zumindest knapp an der Zwei-Prozent-Marke zu halten. [or] [at least] [close] [at] [the] [two percent mark] [to keep] Google Translate ( translate.google.com ): Die EZB will die Inflation zu halten unter zwei Prozent, [the] [ECB] [wants] [the] [inflation] [*to keep] [below] [two percent] oder irgendwo in der Nähe. [or] [somewhere] [in] [the] [vicinity] Syntax-based MT using MBOT A. Maletti 4 ·

  8. Motivation Syntax-based machine translation Remark There is no universally accepted definition Syntax-based systems Language Machine model Input − → Parser − → translation − → − → Output system Syntax-based MT using MBOT A. Maletti 5 ·

  9. Motivation What do we have? Input Parallel text (English and German) E URO P ARL Parsers B IT P AR , C HARNIAK , B ERKELEY Example “We must bear in mind the Community as a whole.” “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” Syntax-based MT using MBOT A. Maletti 6 ·

  10. Motivation What do we have? Input Parallel text (English and German) E URO P ARL Parsers B IT P AR , C HARNIAK , B ERKELEY Example “We must bear in mind the Community as a whole.” “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” Syntax-based MT using MBOT A. Maletti 6 ·

  11. Motivation What do we have? Input Parallel text (English and German) E URO P ARL Parsers B IT P AR , C HARNIAK , B ERKELEY Example “We must bear in mind the Community as a whole.” “Wir müssen uns davor hüten, alles vergemeinschaften zu wollen.” E URO P ARL German-English parallel data: 1 , 920 , 209 parallel sentences 44 , 548 , 491 words in German 47 , 818 , 827 words in English Syntax-based MT using MBOT A. Maletti 6 ·

  12. Motivation First step: Word Alignment Alignments by G IZA ++ [O CH , N EY ’03]: We must bear in mind the Community as a whole . Wir müssen uns davor hüten , alles vergemeinschaften zu wollen . Syntax-based MT using MBOT A. Maletti 7 ·

  13. Motivation First step: Word Alignment Alignments by G IZA ++ [O CH , N EY ’03]: We must bear in mind the Community as a whole . Wir müssen uns davor hüten , alles vergemeinschaften zu wollen . We can help countries catch up , but not by putting their neighbours on hold . Wir können Ländern beim Aufholen helfen , aber nicht , indem wir ihre Nachbarn in den Wartesaal schicken . Syntax-based MT using MBOT A. Maletti 7 ·

  14. Motivation Second step: Parsing C HARNIAK parser: TOP [C HARNIAK , J OHNSON ’05] S . NP VP . PRP MD VP We must VB PP NP bear IN NP NP PP in NN DT NN IN NP Community as mind the DT NN a whole BitPar parser: TOP [S CHMID ’06] $. S-TOP . NP-SB/Pl VMFIN-HD-Pl VP-OC/inf PPER-HD-Nom.Pl müssen NP-DA PP-OP/V VVINF-HD $, VP-OC/zu , Wir PPER-HD-Dat.Pl PROAV-PH hüten VP-OC/inf VZ-HD uns davor NP-OA VVINF-HD PTKZU-PM VMINF-HD PIS-HD-Acc.Sg.Neut vergemeinschaften zu wollen alles Syntax-based MT using MBOT A. Maletti 8 ·

  15. Motivation Second step: Parsing C HARNIAK parser: TOP S , . S CC FRAG , . NP VP but RB PP PRP MD VP not IN S We can VB S by VP help NP VP VBG NP PP putting NNS VB PRT PRP$ NNS IN NP countries catch RP their neighbours on NN up hold BitPar parser: TOP CS-TOP $. ... . S-TOP $, , ... NP-SB/Pl VMFIN-HD-Pl VP-OC/inf PPER-HD-Nom.Pl können NP-DA PP-MO/V VVINF-HD APPRART-AC-Dat.Sg.Neut NN-HD-Dat.Sg.Neut Wir NN-HD-Dat.Pl.Neut helfen Ländern beim Aufholen Syntax-based MT using MBOT A. Maletti 9 ·

  16. Motivation Equalizing examples Input Yugoslav President Voislav signed for Serbia. � ��� ����� � �� �� ��� ��� ���� ������ � ���� � � ��� ��� �� ���� �� �� �� � � � Transliteration: w twlY AltwqyE En SrbyA Alr}ys AlywgwslAfy fwyslAf. And then the matter was decided, and everything was put in place. � ���� ��� � � ���� � � �� � �� � ������� ��� � � �� � �� � �� � � � Transliteration: f kAn An tm AlHsm w wDEt Al > mwr fy nSAb hA. Below are the male and female winners in the different categories. ���� ���� �� � ���� � � ��� ��� �� � �� �� ���� � � � � ��� ��� � Transliteration: w hnA Al > wA}l w Al > wlyAt fy mxtlf Alf}At. Syntax-based MT using MBOT A. Maletti 10 ·

  17. Motivation Equalizing examples Alignment Yugoslav President Voislav signed for Serbia w twlY AltwqyE En SrbyA Alr}ys AlywgwslAfy fwyslAf Syntax-based MT using MBOT A. Maletti 11 ·

  18. Motivation Rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S Syntax-based MT using MBOT A. Maletti 12 ·

  19. Motivation Rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S Syntax-based MT using MBOT A. Maletti 12 ·

  20. Motivation Rule extraction S NP-SBJ VP NML NNP VBD PP signed JJ NNP Voislav IN NP Yugoslav President for NNP Serbia SrbyA AltwqyE Alr}ys AlywgwslAfy fwyslAf En NN-PROP DET-NN PREP NP DET-NN DET-ADJ NN-PROP twlY NP PP NP NP w PV NP-OBJ NP-SBJ CONJ VP S Syntax-based MT using MBOT A. Maletti 12 ·

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend