continued training algorithms
play

Continued Training Algorithms Huda Khayrallah, Jeremy Gwinnup SCALE - PowerPoint PPT Presentation

Continued Training Algorithms Huda Khayrallah, Jeremy Gwinnup SCALE Readout August 9, 2018 Wasch Hnde dir die Source Embedding Wasch Hnde dir die Encoder Source Embedding Wasch Hnde dir die Decoder Encoder Source


  1. Extra Slides

  2. hands your Wash Target Embedding So6max Decoder Encoder Source Embedding Wasch Hände dir die

  3. hands your Wash Target Embedding So6max Decoder Encoder Source Embedding Wasch Hände dir die

  4. Hyperparameters Model architecture � Model architecture Vocabulary Vocabulary � • • num_embed="512:512" � BPE on Source and Target � • • rnn_num_hidden=512 � num_words=30k:30k � • • rnn_attention_type="dot" � word_min_count="1:1" � • • num_layers=2 � max_seq_len="100:100” � • rnn_cell_type="lstm" � Training configuration � Training configuration • Regularization � Regularization batch_size=4096 � • • embed_dropout=0.0 � optimizer=adam � • • rnn_dropout=0.1 � initial_learning_rate=0.0003 � • • label_smoothing=0.1 � learning_rate_reduce_factor=0.7 � • loss="cross-entropy” � • checkpoint_frequency=4000 �

  5. Alternate MT explanation

  6. � � � Case Study Our office needs to translate a lot of Russian patents. � We have a few translators, but they can only process a small fraction of our data. � We would like to use machine translation find the most interesting documents and let our translators focus on those. � We know neural machine translation has state-of-the-art performance, so we decide to build a Neural system… �

  7. MT training General Domain Data General Domain NMT Model

  8. MT training In-domain Data In-Domain NMT Model

  9. MT training General Domain Data Mixed Domain NMT Model In-domain Data

  10. MT training In-domain Data Con\nued- General Training Domain NMT NMT Model Model

  11. MT training General Domain NMT Model 50M General Domain sentence pairs

  12. Continued Training Con\nue Train on training on general in-domain domain data data Domain Random General Adapted Ini\alized Domain NMT NMT NMT Model Model Model

  13. General Domain NMT General Domain NMT Model 50M General Domain sentence pairs

  14. General Domain NMT General Errors due Domain to domain NMT mismatch Model дверной замок повышенной степени защищенности от взлома Human: door lock with increased degree of security against burglary System: door security door security door

  15. Keyword Search

  16. � Keyword Search (sort of) Extrinsic measure of MT Output quality based on ability to retrieve (i.e., match) words or phrases � [Insert cartoon] �

  17. Human assigned categories Keyword venture capitalist, zero gravity, hydrogen � Sentiment fantastic, messy, bad, happy � Person Heidi, Chris, Leonardo da Vinci, Aristotle � Organization Toyota, UNESCO, Ikea, Swedish Army � Geo-Political Entity Egypt, San Francisco, Haiti � Location Arctic, Africa, hospital, ER, lobby � Date Friday, 1980s, last March, today � Temporal Expression 4:00 am, 30-second, six weeks � Numeric Expression 20 percent, 27 kilometers, one- fifth, two nurses �

  18. This metric is pessimistic Inexact matches count as failure � Tokenization issues exacerbate measures � 70 year old vs. 70-year-old � Alternative (very acceptable) translations can count as failure �

  19. Results

  20. Russian Patent + 10.1 40 30 20 BLEU 10 0 General Domain In-Domain Con\nued Training

  21. Russian Patent +7.2 40 30 20 10 BLEU 0 General Domain In-Domain Con\nued Training Online A

  22. Patent Results 70 +11.3 60 +9.8 50 +7.2 40 +4.6 30 BLEU 20 10 0 German Korean Russian Chinese SMT Domain Adapted Online A NMT Con\nued Training

  23. TED Results 45 +1.4 40 +1.1 35 +0.0 +7.0 30 25 -0.6 -0.7 20 BLEU 15 10 5 0 Arabic German Farsi Korean Russian Chinese SMT Domain Adapted Online A Con\nued Training

  24. Patent Results 70 +0.4 60 +3.5 50 +7.2 BLEU 40 +1.8 30 20 10 0 German Korean Russian Chinese General Domain In-Domain Con\nued Training Online A

  25. TED Results 45 +1.4 +1.1 40 35 +0.0 +6.6 30 BLEU +1.6 25 -0.7 20 15 10 5 0 Arabic German Farsi Korean Russian Chinese General Domain In Domain Con\nued Training Online A

  26. TED results Training data Ar De Fa Ko Ru Zh 24.0 31.0 13.9 6.7 25.0 15.2 SMT General Domain 27.8 31.9 18.2 10.7 25.7 16.1 Mixed Domain NMT General Domain 29.6 34.6 22.2 11.6 23.4 15.9 In Domain (TED) 27.4 32.3 21.3 14.4 22.9 16.2 Mixed Domain --- 35.6 --- --- 24.5 17.8 Con\nued Training 35.4 39.9 27.9 17.2 28.6 20.4 Microso] Translator 34.3 38.5 20.9 17.9 28.6 21.0

  27. Patent results Training data De Ko Ru Zh 26.6 2.4 21.4 13.7 SMT General Domain 50.6 21.7 29.0 29.8 Mixed Domain NMT General Domain 36.0 2.7 23.4 12.6 In Domain (TED) 61.9 29.9 26.9 40.2 Mixed Domain 58.4 --- 27.7 33.7 Con\nued Training 62.3 31.7 37.0 43.7 Microso] Translator 51.0 27.1 29.8 33.9

  28. � � Patent 60 50 40 30 20 BLEU 10 0 0 1000 2000 3000 4000 5000 6000 7000 8000 German Korean Russian Chinese

  29. � � TED 40 30 20 10 BLEU 0 0 50000 100000 150000 Russian

  30. � � TED 40 30 20 10 BLEU 0 0 10000 20000 30000 40000 50000 60000 Arabic German Farsi Korean Russian Chinese

  31. � � TED 40 30 20 10 BLEU 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Arabic German Farsi Korean Russian Chinese

  32. Human Eval

  33. Continued Training vs General 70% 60% 50% 40% 30% 20% 10% 0% Arabic Korean Chinese Con\nued Training Tie General

  34. Continued Training vs Human 70% 60% 50% 40% 30% 20% 10% 0% Arabic Korean Chinese Con\nued Training Tie Human

  35. Continued Training vs General 90 80 70 60 50 40 30 20 10 0 Arabic Korean Chinese Con\nued Training Tie General

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend