cr cros oss lin lingual al lan languag age mod model pr
play

Cr Cros oss-lin lingual al lan languag age mod model pr - PowerPoint PPT Presentation

Cr Cros oss-lin lingual al lan languag age mod model pr pretraini ning ng Alexis Conneau and Guillaume Lample Facebook AI Research 1 Why learning cross-lingual representations? 1 2 4 3 This is great. Cest super. Das ist toll.


  1. Cr Cros oss-lin lingual al lan languag age mod model pr pretraini ning ng Alexis Conneau and Guillaume Lample Facebook AI Research 1

  2. Why learning cross-lingual representations? 1 2 4 3 This is great. C’est super. Das ist toll. 2

  3. Cross-lingual language models 3

  4. Mult. Masked Language Modeling (MLM) Similar to BERT, we pretrain a Transformer model with MLM but in many languages: Multilingual Masked language modeling pretraining .. multilingual representations emerge from a single MLM trained on many languages. Devlin et al. – BERT: Pretraining of Deep Bidirectional Transformers for Language Understanding (+ mBERT) 4

  5. Translation Language Modeling (TLM) Multilingual MLM is unsupervised, but we leverage parallel data with TLM: Translation language modeling (TLM) pretraining .. to encourage the model to leverage cross-lingual context when making predictions. 5

  6. Results on XLU benchmarks 6

  7. Results on Cross-lingual Classification (XNLI) Average XNLI accuracy on the 15 languages of XNLI The pretrained encoder is fine-tuned on the English for zero-shot cross-lingual classification XNLI(*) training data and then tested on 15 languages XNLI baseline 65.6 mBERT 66.3 XLM LASER 70.2 XLM (MLM) 71.5 XLM (MLM+TLM) 75.1 60 64 68 72 76 Average XNLI accuracy over 15 languages (*) Conneau et al. – XNLI: Evaluating Cross-lingual Sentence Representations (EMNLP 2018) 7

  8. Results on Unsupervised Machine Translation Initialization is key in unsupervised MT to bootstrap the iterative BT process Embedding layer initialization Full Transformer model initialization is essential for neural unsupervised MT (*) significantly improves performance (+7 BLEU) Embeddings pretrained 27.3 Full model pretrained (CLM) 30.5 Full model pretrained (MLM) 34.3 Supervised 2016 SOTA (Edinburgh) 36.2 20 24 28 32 36 40 BLEU (*) Lample et al. – Phrase-based and neural unsupervised machine translation (EMNLP 2018) 8

  9. Results on Supervised Machine Translation We also show the importance of pretraining for generation • Pretraining both the encoder and decoder improves BLEU score No pretraining • MLM better than LM pretraining Full model pretrained (CLM) • Back-translation + pretraining leads to the best BLEU score Full model pretrained (MLM) 20 24 28 32 36 40 • Pretraining is more important without back-translation with back-translation when supervised data is small 9

  10. Conclusion • Cross-lingual language model pretraining is very effective for XLU • New state of the art for cross-lingual classification on XNLI • Reduces the gap between unsupervised and supervised MT • Recent developments have improved XLM/mBERT models 10

  11. Thank you! Code and models available at github.com/facebookresearch/XLM Lample & Conneau – Cross-lingual Language Model Pretraining (NeurIPS 2019) 11

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend