Syntactically Guided Neural Machine Translation Felix Stahlberg, - - PowerPoint PPT Presentation

β–Ά
syntactically guided neural machine translation
SMART_READER_LITE
LIVE PREVIEW

Syntactically Guided Neural Machine Translation Felix Stahlberg, - - PowerPoint PPT Presentation

Syntactically Guided Neural Machine Translation Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne Department of Engineering Syntactically Guided Neural Machine Translation Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne


slide-1
SLIDE 1

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne Department of Engineering

slide-2
SLIDE 2

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Neural machine translation (NMT) vs. Hiero

NMT

  • Simple beam search*
  • No explicit coverage

mechanism*

  • Limited vocabulary size*
  • Long-range context (RNN)

Hiero

  • Searches over a vast

number of translations

  • CKY parses cover the

complete source sentence

  • Very large vocabularies,
  • pen to extension
  • Limited LM context, weak

translation model

*: Vanilla formulation of attentional NMT according Bahdanau et al., 2015

slide-3
SLIDE 3

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Combining NMT and Hiero scores

  • NMT left-to-right factorization:
  • NMT+Hiero via log-linear model combination

x: Source sentence y = 𝑧1

π‘ˆ: Target sentence

UNK score is used for NMT OOVs Hiero predictive posteriors through FST weight pushing

slide-4
SLIDE 4

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

FST weight pushing

<s>|0.5 b|0.5 a|0.8 </s>|0.1 </s>|1.0 a|0.4 b|0.5 c|0.4 b|0.8 c|0.9 Hiero lattice: <s>|0.409 b|0.061 a|0.939 </s>|1.0 </s>|1.0 a|0.042 b|1.0 c|0.208 b|0.75 c|1.0 Hiero lattice after weight pushing: a|0.042 c|0.208 b|0.75 𝑄𝐼𝑗𝑓𝑠𝑝 𝑧3 = a s a, 𝐲 = 0.042 𝑄𝐼𝑗𝑓𝑠𝑝 𝑧3 = c s a, 𝐲 = 0.208 𝑄𝐼𝑗𝑓𝑠𝑝 𝑧3 = b s a, 𝐲 = 0.75

slide-5
SLIDE 5

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Results on news-test2014

English-German (BLEU) English-French (BLEU) Baselines and related work Hiero baseline (de Gispert et al., 2010) 19.44 32.86 Basic NMT (RNNsearch) (Bahdanau et al., 2015) 16.31 30.42 RNNsearch-LV + UNK Replace (Jean et al., 2015) 19.40 34.60 This work Syntactically guided NMT (πœ‡πΌπ‘—π‘“π‘ π‘ = 0) 20.69 35.37 Syntactically guided NMT (tuned πœ‡π‘‚π‘π‘ˆ, πœ‡πΌπ‘—π‘“π‘ π‘) 21.87 36.61

slide-6
SLIDE 6

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Results on news-test2015 (English-German)

Search space # of node expansions per sentence BLEU 100-best rescoring 2,233.6

(Depth-First Search: 832.1)

22.9 1000-best rescoring 21,686.2

(Depth-First Search: 6,221.8)

23.5 Lattice-based (Syntactically guided NMT) 244.3 24.0 NMT baseline: 19.5 BLEU Hiero baseline (with NPLM): 21.7 BLEU

slide-7
SLIDE 7

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Conclusion

  • Using syntactic SMT to guide neural machine translation yields great

potential

  • Our lattice-based approach is faster and better than n-best list rescoring
  • More discussion in the paper
  • NMT modelling vs. search errors
  • Local softmax
  • Beam size
  • Lattice size
  • …
slide-8
SLIDE 8

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

References

  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural

machine translation by jointly learning to align and translate. In ICLR

  • Adria de Gispert, Gonzalo Iglesias, Graeme Blackwood, Eduardo R

Banga, and William Byrne. 2010. Hierarchical phrase-based translation with weighted finite-state transducers and shallow-n grammars. Computational Linguistics, 36(3):505–533.

  • Sebastien Jean, Kyunghyun Cho, Roland Memisevic, and Yoshua
  • Bengio. 2015a. On using very large target vocabulary for neural

machine translation. In ACL, pages 1–10.

slide-9
SLIDE 9

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Thanks

Code available at http://ucam-smt.github.io/sgnmt/html

slide-10
SLIDE 10

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

BACKUP

slide-11
SLIDE 11

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Results

slide-12
SLIDE 12

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Beam size

slide-13
SLIDE 13

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Lattice size

slide-14
SLIDE 14

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

Data

slide-15
SLIDE 15

Syntactically Guided Neural Machine Translation

Felix Stahlberg, Eva Hasler, Aurelien Waite, and Bill Byrne

RNN Update