11 001
play

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David - PowerPoint PPT Presentation

11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang MOTIVATION Maria no di una bofetada a la bruja verde the green witch Maria DT JJ NN slap NNP NP VB NP did not NP VP RB VBD VP VP


  1. 11,001 N EW F EATURES FOR S TATISTICAL M ACHINE T RANSLATION David Chiang Kevin Knight Wei Wang

  2. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slap NNP NP VB NP did not NP VP RB VBD VP VP NP S

  3. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slapped NNP NP VBD NP did not NP VP RB VBD VP VP NP S

  4. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slapped NNP NP VBD NP did not NP VP RB VBD VP VP NP S

  5. MOTIVATION Maria no dió una bofetada a la bruja verde the green witch Maria DT JJ NN slapped NNP NP VBD NP did not NP VP RB VBD VP VP NP S

  6. MOTIVATION • Minimum error rate training (MERT) works for <30 features • Margin infused relaxed algorithm (MIRA) • Online large-margin discriminative training • Scales better to large feature sets • Enables freer exploration of features

  7. RESULTS GALE 2008 Chinese-English data System Training Features BLEU MERT 11 36.1 Hiero Hiero MIRA 10,990 37.6 MERT 25 39.5 Syntax Syntax MIRA 283 40.6

  8. OVERVIEW Training • Features • Experiments •

  9. Training

  10. MIRA • Crammer and Singer, 2003 • Applied to statistical MT by Watanabe et al., 2007 • Chiang, Marton, and Resnik, 2008: • use more of the forest • parallelize training

  11. MERT BLEU Model score

  12. MERT BLEU Model score

  13. MIRA loss BLEU margin Model score

  14. MIRA BLEU Model score

  15. FOREST-BASED TRAINING 0.5 0.45 0.4 0.35 BLEU 0.3 0.25 0.2 0.15 0.1 0.05 -46 -44 -42 -40 -38 -36 -34 Model score

  16. PARALLEL TRAINING • Run n MIRA learners in parallel decode decode broadcast • Share information among learners update broadcast decode update broadcast update Hiero n = 20 decode broadcast Syntax n = 73 update decode decode broadcast

  17. Features

  18. DISCOUNT FEATURES PP IN PP 晚上 NP 1 左右 count=1 from IN NP 1 p.m. around around • Low counts are often overestimates • Introduce a count=1 feature that fires on 1-count rules, etc.

  19. TARGET SYNTAX FEATURES UN inspectors VP VP UN inspectors VP VBD VP VP were VP expelled by NK expelled by NK insert-were

  20. TARGET SYNTAX FEATURES S NP ADVP VP . " bad-rewrite NP PP NNP NP PP NN IN NP edo NP PP in mind thinking of NP " PP art for the generation bad-rewrite the best-selling book VBN NP published his autobiography …

  21. TARGET SYNTAX FEATURES S S , S NP VP NP VP , Yoon Yoon VBD VBD S said said node=,

  22. TARGET SYNTAX FEATURES 第一 个 站 出 来 第一 个 站 出 来 first stand come out first stand come out NP PP NP SBAR IN NP the first to VP the first leg IN VP from stand up root=IN root=VP

  23. SOURCE CONTEXT FEATURES Marton & Resnik 2008; Chiang et al 2008 VP 这 是 一 个 值得 关 注 和 研究 的 新 动 向 . merit attention and study new trend this is a new trends in the study cross-VP • Use external parser to infer source-side syntax • Rewards and penalties for matching/crossing brackets

  24. SOURCE CONTEXT FEATURES Marton & Resnik 2008; Chiang et al 2008 VP 这 是 一 个 值得 关 注 和 研究 的 新 动 向 . merit attention and study new trend this is a meriting attention and study match-VP • Use external parser to infer source-side syntax • Rewards and penalties for matching/crossing brackets

  25. SOURCE CONTEXT FEATURES Chiang et al 2008 挪威 恢复 在 斯里 兰 卡 的 和平 斡旋 Norway restore in Sri Lanka peace mediation to restore peace in Sri Lanka , the Norwegian mediation 挪威 恢复 在 斯里 兰 卡 的 和平 斡旋 Norway restore in Sri Lanka peace mediation Norway restoring peace mediation in Sri Lanka

  26. SOURCE CONTEXT FEATURES • Word context features: similar to Watanabe et al. 2007 and work on WSD in MT (Chan et al. 2007, Carpuat & Wu 2007) • Relate a word’s translation with its left or right neighbor on the source side (just the 100 most frequent types) f i-1 f i f i f i+1 e e

  27. SOURCE CONTEXT FEATURES 他 说 , 由于 没 有 配音 , 他 不得不 he said because no voice he had to since there is no voice , he said , he had to f i =, & f i-1 = 说 & e=, 他 说 , 由于 没 有 配音 , 他 不得不 he said because no voice he had to he said that because of the lack of voice , he had to f i =, & f i-1 = 说 & e=that

  28. Experiments

  29. TRAINING DATA GALE 2008 Chinese-English data Hiero Syntax Parallel data 260M 65M Language model 2G 1G MERT/MIRA 58k 58k Test 57k 57k

  30. RESULTS (HIERO) Chinese-English Training Features # BLEU MERT baseline 11 36.1 +source-side syntax 56 36.9 +distortion MIRA +discount 61 37.3 +word context 10,990 37.6

  31. RESULTS (SYNTAX) Chinese-English Training Features # BLEU MERT baseline 25 39.5 baseline 25 39.8 rule overlap 132 39.9 node count 136 40.0 MIRA +discount +bad rewrite 283 40.6 +insertion

  32. CONCLUSIONS • Using underutilized information for new features: • Source context is computationally efficient • Target syntax provides a rich structure • MIRA is working well on new features, systems, languages

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend