Continued Training Algorithms Huda Khayrallah, Jeremy Gwinnup SCALE - PowerPoint PPT Presentation

Extra Slides

hands your Wash Target Embedding So6max Decoder Encoder Source Embedding Wasch Hände dir die

Hyperparameters Model architecture � Model architecture Vocabulary Vocabulary � • • num_embed="512:512" � BPE on Source and Target � • • rnn_num_hidden=512 � num_words=30k:30k � • • rnn_attention_type="dot" � word_min_count="1:1" � • • num_layers=2 � max_seq_len="100:100” � • rnn_cell_type="lstm" � Training configuration � Training configuration • Regularization � Regularization batch_size=4096 � • • embed_dropout=0.0 � optimizer=adam � • • rnn_dropout=0.1 � initial_learning_rate=0.0003 � • • label_smoothing=0.1 � learning_rate_reduce_factor=0.7 � • loss="cross-entropy” � • checkpoint_frequency=4000 �

Alternate MT explanation

� � � Case Study Our office needs to translate a lot of Russian patents. � We have a few translators, but they can only process a small fraction of our data. � We would like to use machine translation find the most interesting documents and let our translators focus on those. � We know neural machine translation has state-of-the-art performance, so we decide to build a Neural system… �

MT training General Domain Data General Domain NMT Model

MT training In-domain Data In-Domain NMT Model

MT training General Domain Data Mixed Domain NMT Model In-domain Data

MT training In-domain Data Con\nued- General Training Domain NMT NMT Model Model

MT training General Domain NMT Model 50M General Domain sentence pairs

Continued Training Con\nue Train on training on general in-domain domain data data Domain Random General Adapted Ini\alized Domain NMT NMT NMT Model Model Model

General Domain NMT General Domain NMT Model 50M General Domain sentence pairs

General Domain NMT General Errors due Domain to domain NMT mismatch Model дверной замок повышенной степени защищенности от взлома Human: door lock with increased degree of security against burglary System: door security door security door

Keyword Search

� Keyword Search (sort of) Extrinsic measure of MT Output quality based on ability to retrieve (i.e., match) words or phrases � [Insert cartoon] �

Human assigned categories Keyword venture capitalist, zero gravity, hydrogen � Sentiment fantastic, messy, bad, happy � Person Heidi, Chris, Leonardo da Vinci, Aristotle � Organization Toyota, UNESCO, Ikea, Swedish Army � Geo-Political Entity Egypt, San Francisco, Haiti � Location Arctic, Africa, hospital, ER, lobby � Date Friday, 1980s, last March, today � Temporal Expression 4:00 am, 30-second, six weeks � Numeric Expression 20 percent, 27 kilometers, one- fifth, two nurses �

This metric is pessimistic Inexact matches count as failure � Tokenization issues exacerbate measures � 70 year old vs. 70-year-old � Alternative (very acceptable) translations can count as failure �

Results

Russian Patent + 10.1 40 30 20 BLEU 10 0 General Domain In-Domain Con\nued Training

Russian Patent +7.2 40 30 20 10 BLEU 0 General Domain In-Domain Con\nued Training Online A

Patent Results 70 +11.3 60 +9.8 50 +7.2 40 +4.6 30 BLEU 20 10 0 German Korean Russian Chinese SMT Domain Adapted Online A NMT Con\nued Training

TED Results 45 +1.4 40 +1.1 35 +0.0 +7.0 30 25 -0.6 -0.7 20 BLEU 15 10 5 0 Arabic German Farsi Korean Russian Chinese SMT Domain Adapted Online A Con\nued Training

Patent Results 70 +0.4 60 +3.5 50 +7.2 BLEU 40 +1.8 30 20 10 0 German Korean Russian Chinese General Domain In-Domain Con\nued Training Online A

TED Results 45 +1.4 +1.1 40 35 +0.0 +6.6 30 BLEU +1.6 25 -0.7 20 15 10 5 0 Arabic German Farsi Korean Russian Chinese General Domain In Domain Con\nued Training Online A

TED results Training data Ar De Fa Ko Ru Zh 24.0 31.0 13.9 6.7 25.0 15.2 SMT General Domain 27.8 31.9 18.2 10.7 25.7 16.1 Mixed Domain NMT General Domain 29.6 34.6 22.2 11.6 23.4 15.9 In Domain (TED) 27.4 32.3 21.3 14.4 22.9 16.2 Mixed Domain --- 35.6 --- --- 24.5 17.8 Con\nued Training 35.4 39.9 27.9 17.2 28.6 20.4 Microso] Translator 34.3 38.5 20.9 17.9 28.6 21.0

Patent results Training data De Ko Ru Zh 26.6 2.4 21.4 13.7 SMT General Domain 50.6 21.7 29.0 29.8 Mixed Domain NMT General Domain 36.0 2.7 23.4 12.6 In Domain (TED) 61.9 29.9 26.9 40.2 Mixed Domain 58.4 --- 27.7 33.7 Con\nued Training 62.3 31.7 37.0 43.7 Microso] Translator 51.0 27.1 29.8 33.9

� � Patent 60 50 40 30 20 BLEU 10 0 0 1000 2000 3000 4000 5000 6000 7000 8000 German Korean Russian Chinese

� � TED 40 30 20 10 BLEU 0 0 50000 100000 150000 Russian

� � TED 40 30 20 10 BLEU 0 0 10000 20000 30000 40000 50000 60000 Arabic German Farsi Korean Russian Chinese

� � TED 40 30 20 10 BLEU 0 0 1000 2000 3000 4000 5000 6000 7000 8000 Arabic German Farsi Korean Russian Chinese

Human Eval

Continued Training vs General 70% 60% 50% 40% 30% 20% 10% 0% Arabic Korean Chinese Con\nued Training Tie General

Continued Training vs Human 70% 60% 50% 40% 30% 20% 10% 0% Arabic Korean Chinese Con\nued Training Tie Human

Continued Training vs General 90 80 70 60 50 40 30 20 10 0 Arabic Korean Chinese Con\nued Training Tie General

Continued Training Algorithms Huda Khayrallah, Jeremy Gwinnup SCALE - PowerPoint PPT Presentation

Continued Training Algorithms Huda Khayrallah, Jeremy Gwinnup SCALE Readout August 9, 2018 Wasch Hnde dir die Source Embedding Wasch Hnde dir die Encoder Source Embedding Wasch Hnde dir die Decoder Encoder Source

Graph Algorithms Chapter 22 1 CPTR 430 Algorithms Graph Algorithms Why Study Graph Algorithms?

Greedy Algorithms Chapter 16 1 CPTR 430 Algorithms Greedy Algorithms Greedy Algorithms For

Algorithms Chapter 3 Chapter Summary Algorithms n Example Algorithms n Algorithmic Paradigms

Compliance Training 2012 Compliance Training 2012 Training Objectives Training Objectives

General remarks Algorithms Algorithms Oliver Oliver Week 8 Kullmann Kullmann Greedy Greedy

Algorithms using real numbers (continued) Noter ch.4 Algorithms using real numbers

New Staff Training Training Site Development Training Site Development 2 Training Site

Product Features Technical Training 2007 Technical Training 2007 Technical Training 2007

Todays Agenda Todays Agenda Continued Todays Agenda Continued Save the Date August

Cluster algebras, snake graphs and continued fractions Ralf Schiffler Intro Cluster algebras

The game Euclid , its variants, and continued fractions Nhan Bao Ho 23 April 2014 Nhan Bao Ho

The ergodic theory of continued fraction maps Speaker: Radhakrishnan Nair University of

- - packing p a - packing algo- packing cking rithms algo- a l g o - theorems rithms

Evolutionary Algorithms CS 478 - Evolutionary Algorithms 1 Evolutionary Computation/Algorithms

Boosting: Foundations and Algorithms Boosting: Foundations and Algorithms Boosting: Foundations

Machine Learning Algorithms for Classification Machine Learning Algorithms for Classification

Improving Equity of Support through a Multi-Tiered System of Supports (MTSS) Practice Dr. Eva

Mapping from SORT to GRADE Brian S. Alper, MD, MSPH, FAAFP Editor-in-Chief, DynaMed October 31,

CSE 258 Lecture 7 Web Mining and Recommender Systems Recommender Systems Announcements

Evaluation in primary care: What works, why, and in what circumstances? EQUIP Conference April

ErlyWeb A web development framework for Erlang Yariv Sadan 12/6/2007 Benefits Erlang/OTP

IP Hierarchical Routing

Competition and Competition and Collaboration in Wireless Collaboration in Wireless Networks

GOVERNANCE!! Policy Governance Policy Governance -- informally known as the Carver model -- is a