The LIG Arabic / English Speech Translation System at IWSLT07 - PowerPoint PPT Presentation

The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG*/GETALP (Grenoble, France) Laurent.Besacier@imag.fr 1 * Former name : CLIPS

OUTLINE 1 Baseline MT system -Task, data & tools - Restoring punctuation and case - Use of out-of-domain data - Adding a bilingual dictionary 2 Lattice decomposition for CN decoding - Lattice to CNs - Word lattices to sub-word lattices - What SRI-LM does - Our algo. -Examples in arabic 3 Speech translation experiments - Results on IWSLT06 2 - Results on IWSLT07 (eval)

OUTLINE 1 Baseline MT system -Task, data & tools - Restoring punctuation and case - Use of out-of-domain data - Adding a bilingual dictionary 3

Task, data & tools First participation to IWSLT  A/E task  Conventional phrase-based system using  Moses+Giza+sri-lm Use of IWSLT-provided data (20k bitext)  except A 84k A/E bilingual dictionary taken from  http://freedict.cvs.sourceforge.net/freedict/eng-ara/ The buckwalter morphological analyzer  LDC’s Gigaword corpus (for english LM training)  4

Restoring punctuation and case 2 separated punct. and case restoration  tools built using hidden-ngram and disambig commands from sri-lm => restore MT outputs  (1) (2) (3) train with case train without train with restored & punct case & punct case & punct dev06 0.2341 0.2464 0.2298 tst06 0.1976 0.1948 0.1876 Option (2) kept 5

Use of out-of-domain data Baseline in-domain LM trained on the english  part of A/E bitext Interpolated LM between Baseline and Out-  of-domain (LDC gigaword) : 0.7/0.3 In domain Interpolated in- Interpolated in- LM domain and out-of- domain and out-of- No MERT domain LM domain LM No MERT MERT on dev06 dev06 0.2464 0.2535 0.2674 tst06 0.1948 0.2048 0.2050 6

Adding a bilingual dictionary A 84k A/E bilingual dictionary taken from  http://freedict.cvs.sourceforge.net/freedict/eng-ara/ Directly concatenated to the training data +  retraining + retuning (mert) No bilingual dict. Use of a bilingual dict. dev06 0.2674 0.2948 tst06 0.2050 0.2271 Submitted MT system (from verbatim trans.) 7

OUTLINE 2 Lattice decomposition for CN decoding - Lattice to CNs - Word lattices to sub-word lattices - What SRI-LM does - Our algo. -Examples in arabic 8

Lattice to CNs Moses allows to exploit CN as interface between ASR  and MT Example of word lattice and word CN  9

Word lattices to sub-word lattices  Problem : word graphs provided for IWSLT07 do not have necessarily word decomposition compatible with the word decomposition used to train our MT models Word units vs sub-word units  Different sub-word units used   Need for a lattice decomposition algorithm 10

What SRI-LM does Example :  CANNNOT splitted into CAN and NOT -split-multiwords  option of lattice- tool First node keeps all  the information new nodes have  null scores and zero-duration 11

Proposed lattice decomposition algorithm (1) identify the arcs of the graph that will be split  (decompoundable words) each arc to be split is decomposed into a number of arcs that  depends on the number of subword units the start / end times of the arcs are modified according to the  number of graphemes into each subword unit so are the acoustic scores  the first subword of the decomposed word is equal to the initial  LM score of the word, while the following subwords LM scores are made equal to 0 Freely available on  http://www-clips.imag.fr/geod/User/viet-bac.le/outils/ 12

Proposed lattice decomposition algorithm (2) 13

Examples in arabic Word lattice 14

Examples in arabic Sub-Word lattice 15

OUTLINE 3 Speech translation experiments - Results on IWSLT06 - Results on IWSLT07 (eval) 16

Results on IWSLT06 Full CN decoding (subword CN as input)  obtained after applying our word lattice  decomposition algorithm all the parameters of the log-linear model used for  the CN decoder were retuned on dev06 set “CN posterior probability parameter” to be tuned  (1) (2) (3) (4) verbatim 1-best cons-dec full-cn-dec dev06 0.2948 0.2469 0.2486 0.2779 tst06 0.2271 0.1991 0.2009 0.2253 17 ASR secondary ASR primary

Results on IWSLT07 (eval) clean ASR ASR verbatim 1-best full-cn-dec Eva07 0.4135 0.3644 0.3804 AE ASR 1XXXX BLEU score = 0.4445 2XXXX BLEU score = 0.4429 3XXXX BLEU score = 0.4092 4XXXX BLEU score = 0.3942 5XXXX BLEU score = 0.3908 6LIG_AE_ASR_primary_01 BLEU score = 0.3804 7XXXX BLEU score = 0.3756 8XXXX BLEU score = 0.3679 9XXXX BLEU score = 0.3644 10XXXX BLEU score = 0.3626 18 11XXXX BLEU score = 0.1420

The LIG Arabic / English Speech Translation System at IWSLT07 - PowerPoint PPT Presentation

The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG/GETALP (Grenoble, France) Laurent.Besacier@imag.fr 1 Former name : CLIPS OUTLINE 1 Baseline MT system -Task, data &

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation

4 English I CP or Honors Credits English II CP or Honors of English III CP or

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Using Synonyms for Arabic-to-English Example-Based Translation Kfir Bar Nachum Dershowitz Tel

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

RWTH Aachen Machine Translation System: {Arabic, Chinese, German}-English MT Track Stephan Peitz,

Ir. L.G.H. Fortuijn (Bertus)

Funding Package ACT 13 $3,000,000

ALW Airport Master Plan Update Planning Advisory Committee Meeting #4 November 2, 2016 Meeting

Clarksburg Fire Station No. 35 Empowering the Ownership Team to Accomplish Their Mission

TECH4EFFECT KNOWLEDGE AND TECHNOLOGIES FOR EFFECTIVE WOOD PROCUREMENT Raffaele Spinelli CNR

Project administrators RIW system Version 1.0 Agenda Introductions System

Mr. Melvin D. Young, ARA, Mr. Duane E. Webb, ARA & Mr. Steven D. Pendleton Pinal County

Acton Quarry Extension Quarterly Update Enzo Bertucci Dufferin Aggregates Acton Quarry Community

Sambuz

Useful Links

Newsletter

Mail Us

The LIG Arabic / English Speech Translation System at IWSLT07 - PowerPoint PPT Presentation

The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG*/GETALP (Grenoble, France) Laurent.Besacier@imag.fr 1 * Former name : CLIPS OUTLINE 1 Baseline MT system -Task, data &

The Art of Arabic Calligraphy Fayeq Oweis, Ph.D. The Art of Arabic Calligraphy Islamic Art

11-731 Machine Translation Speech 2 Speech Translation Speech Translation Three part systems

Arabic Script Variant Issues for TLDs Arabic Case Study Team Arabic Case Study Team

Arabic POS Tagging Results Error Analysis Conclusion Emad Mohamed, Sandra K ubler Indiana

Speech Processing 15-492/18-492 Speech Translation Speech Translation Three part systems

Speech Processing Speech Processing Using Speech with Computers Overview Overview Speech vs

SDS Aplications - Speech-to-speech translation - Anca Burducea May 28, 2015 S2S Translation

4 English I CP or Honors Credits English II CP or Honors of English III CP or

Expressing I`rab: The Presentation of Arabic Expressing I`rab: The Presentation of Arabic

www.nic .ir . . Singapore52.icann.org Feb 11, 2015 Task Force on

Corpus linguistics resources and tools for Arabic lexicography tools for Arabic lexicography

6-Text To Speech (TTS) Speech Synthesis Speech Synthesis Concept Speech Naturalness Phone

Simple, Lexicalized Choice of Translation Timing for Simultaneous Speech Translation Tomoki

Using Synonyms for Arabic-to-English Example-Based Translation Kfir Bar Nachum Dershowitz Tel

Toward Toward Univeral Network-based Univeral Network-based Speech Translation Speech

RWTH Aachen Machine Translation System: {Arabic, Chinese, German}-English MT Track Stephan Peitz,

Ir. L.G.H. Fortuijn (Bertus)

Funding Package ACT 13 $3,000,000

ALW Airport Master Plan Update Planning Advisory Committee Meeting #4 November 2, 2016 Meeting

Clarksburg Fire Station No. 35 Empowering the Ownership Team to Accomplish Their Mission

TECH4EFFECT KNOWLEDGE AND TECHNOLOGIES FOR EFFECTIVE WOOD PROCUREMENT Raffaele Spinelli CNR

Project administrators RIW system Version 1.0 Agenda Introductions System

Mr. Melvin D. Young, ARA, Mr. Duane E. Webb, ARA &amp; Mr. Steven D. Pendleton Pinal County

Acton Quarry Extension Quarterly Update Enzo Bertucci Dufferin Aggregates Acton Quarry Community

Sambuz

Useful Links

Newsletter

Mail Us

The LIG Arabic / English Speech Translation System at IWSLT07 Laurent BESACIER, Amar MAHDHAOUI, Viet-Bac LE LIG/GETALP (Grenoble, France) Laurent.Besacier@imag.fr 1 Former name : CLIPS OUTLINE 1 Baseline MT system -Task, data &

Mr. Melvin D. Young, ARA, Mr. Duane E. Webb, ARA & Mr. Steven D. Pendleton Pinal County