The Sockeye Neural MT Toolkit at AMTA 2018 Felix Hieber, Tobias - - PowerPoint PPT Presentation

the sockeye neural mt toolkit at amta 2018
SMART_READER_LITE
LIVE PREVIEW

The Sockeye Neural MT Toolkit at AMTA 2018 Felix Hieber, Tobias - - PowerPoint PPT Presentation

mt @ The Sockeye Neural MT Toolkit at AMTA 2018 Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post github.com/awslabs/sockeye Why Sockeye? Sockeye is: A production-ready framework for training


slide-1
SLIDE 1

mt@

The Sockeye Neural MT Toolkit at AMTA 2018

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post github.com/awslabs/sockeye

slide-2
SLIDE 2

Why Sockeye?

Sockeye is:

  • A production-ready framework for training state-of-the-art models
  • A flexible experimentation platform for researchers

Motivation: rapid evolution of Neural MT—different toolkits with different features

  • No single toolkit with everything we need at Amazon
  • Nothing mature for MXNet, our framework of choice

Decision: build such a toolkit—Sockeye

  • Highly scalable (multiple GPUs, large data)
  • Free and open source software (Apache 2.0)

Named after the Sockeye salmon found in the Northern Pacific Ocean (Favorite fish around Seattle, WA) 2

slide-3
SLIDE 3

Quick Start

3

A translation system in 3 slides

slide-4
SLIDE 4

Language model conditioned on source sentence ! = #1, … , #': ( )|# = +

,-. /

((),|).:,2., #) Encode source sentence Decode target sentence Attention connects states across steps Many instantiations:

  • Recurrent
  • Convolutional
  • Self-attentional

Sequence-to-Sequence Modeling

4. <BOS> 45 the 46 white 47 house 8. 85 86 87 89 the white house <EOS> !. la !5 casa !6 blanca :; :< := :>

encoder ?@AB decoder ?C@B

4 D,

slide-5
SLIDE 5

Data Pre-Processing

Given raw parallel text: The shares closed almost unchanged at 187.35 dollars. The question comes alone: Collserola? Park or mountain? Step 1 – Tokenize: The shares closed almost unchanged at 187.35 dollars . The question comes alone : Collserola ? Park or mountain ? Step 2 – Sub-word encode: The share@@ s closed a@@ lmost un@@ chang@@ ed at 18@@ 7@@ .@@ 35 dollar@@ s . The question comes alone : Co@@ ll@@ s@@ er@@ ola ? Park or mountain ? Ready for training! 5

slide-6
SLIDE 6

Running Sockeye

Install Sockeye: pip install sockeye Train with default settings: python -m sockeye.train \

  • -source train-corpus.de \
  • -target train-corpus.en \
  • -validation-source dev-corpus.de \
  • -validation-target dev-corpus.en \
  • -output model.de-en

Decode with default settings: python -m sockeye.translate \

  • -models model.de-en

Customization? 6

slide-7
SLIDE 7

Architectures & Features

7

Customizing translation systems

slide-8
SLIDE 8

Base Architectures

Sockeye supports 3 prominent architectures:

  • Mix and match with --encoder and --decoder options

8 Attentional Recurrent

[Bahdanau et al., 2014, Luong et al., 2015]

Fully Convolutional

[Gehring et al., 2017]

Self-Attentional Transformer

[Vaswani et al., 2017]

slide-9
SLIDE 9

Attention Types

9 Name MLP [Bahdanau et al, 2014] Dot [Luong et al. 2015] Location [Luong et al. 2015] Bilinear [Luong et al. 2015] Coverage [Tu et al. 2015] Multi-head [Vaswani et al., 2017]

v>

a tanh(Wu s + Wv h + Wc C)

softmax sWQ

u (hWK u )>

√du ! hWV

u

<latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="X/BbPQRM1pmBhxdK1enSbL+gJw=">AB2HicbZDNSgMxFIXv1L86Vq1rN8EiuCozbtSd4MZlBcW2qFkMnfa0ExmSO4IpfQFXLhRfDB3vo3pz0KtBwIf5yTk3pOUSloKgi+vtrW9s7tX3/cPGv7h0XGz8WSLygiMRKEK0u4RSU1RiRJYa80yPNEYTeZ3C3y7jMaKwv9SNMS45yPtMyk4OSszrDZCtrBUmwTwjW0YK1h83OQFqLKUZNQ3Np+GJQUz7ghKRTO/UFlseRiwkfYd6h5jaeLcecs3PnpCwrjDua2NL9+WLGc2uneJu5pzG9m+2MP/L+hVl1/FM6rIi1GL1UVYpRgVb7MxSaVCQmjrgwkg3KxNjbrg14zvOgj/brwJ0WX7ph0+BFCHUziDCwjhCm7hHjoQgYAUXuDNG3uv3vuqpq37uwEfsn7+AaqKYoN</latexit><latexit sha1_base64="xmLjPweMJoHYbUkLvjHC50iR23s=">ACqniclVFNTxsxFPRuKR8phZRrL1ajSskl2m0PhBtSL5W4gNQUJysvF47sfCuF/stIrL2J/bCjX9Tb8gBkgipT7I0mnPfp5JSyUtRNFzEH7Y+bi7t3/Q+nT4+ei4/eXw2urKMD5kWmlzk1LlSz4ECQoflMaTvNU8VF696vRw/cWKmLP7Ao+Sns0IKySh4Kmn/JXmqH53VAnL6WBPFBXSJMJQ5kmqV2YXlSM5hXkqnK3rbfSorpNqeoW728T5ezMXvSkBXdaO2HsDLksq323kbA49/N+XSftTtSPloU3QbwCHbSqy6T9RDLNqpwXwBS1dhxHJUwcNSCZ4nWLVJaXlN3RGR97WNCc24lb2l7j757JsNDGnwLwkn094Whumz19Z7OmXdcacps2rkAMJk4WZQW8YC8PiUph0LjJEGfScAZq4QFlRvpdMZtTnxn4pFvehHj9y5tg+KN/1o+vIrSPvqJvqItidIrO0W90iYaIBT+D2yANWDgIp2H24lYrGw7QW8qFP8Alave2Q=</latexit><latexit sha1_base64="xmLjPweMJoHYbUkLvjHC50iR23s=">ACqniclVFNTxsxFPRuKR8phZRrL1ajSskl2m0PhBtSL5W4gNQUJysvF47sfCuF/stIrL2J/bCjX9Tb8gBkgipT7I0mnPfp5JSyUtRNFzEH7Y+bi7t3/Q+nT4+ei4/eXw2urKMD5kWmlzk1LlSz4ECQoflMaTvNU8VF696vRw/cWKmLP7Ao+Sns0IKySh4Kmn/JXmqH53VAnL6WBPFBXSJMJQ5kmqV2YXlSM5hXkqnK3rbfSorpNqeoW728T5ezMXvSkBXdaO2HsDLksq323kbA49/N+XSftTtSPloU3QbwCHbSqy6T9RDLNqpwXwBS1dhxHJUwcNSCZ4nWLVJaXlN3RGR97WNCc24lb2l7j757JsNDGnwLwkn094Whumz19Z7OmXdcacps2rkAMJk4WZQW8YC8PiUph0LjJEGfScAZq4QFlRvpdMZtTnxn4pFvehHj9y5tg+KN/1o+vIrSPvqJvqItidIrO0W90iYaIBT+D2yANWDgIp2H24lYrGw7QW8qFP8Alave2Q=</latexit><latexit sha1_base64="AbyU6bi+ZfnNFEn17cfYTXS7Do=">ACtXiclVFNT+MwFHSywLlq8BxLxYVUrlUCRyAG2IvSHsBaUtBdRs5jt1aOHGwX1ZUVn7iXva2/wan9LBAhcRIlkYz79nj9JSQtR9C8Iv6ysrn1d/9ba2Nza3mnv7t1aXRnG+0wrbe5SarmSBe+DBMXvSsNpnio+SB9+NP7gNzdW6uIXzEo+yumkEIyCl5K2n9InuonZ7WAnD7VRHEBXSIMZY6kWmV25n3lSE5hmgpn63qZPKjrpBrf4O4yc/pRz8+jMQFd1o7YRwMuSypfbeRkCkf405fdJu1O1IvmwO9JvCAdtMB10v5LMs2qnBfAFLV2GEcljBw1IJnidYtUlpeUPdAJH3pa0JzbkZuPvcaHXsmw0MafAvBc/b/D0dw2OX1lE9O+9RpxmTesQJyNnCzKCnjBXh4SlcKgcbNDnEnDGaiZJ5QZ6bNiNqV+Z+A3fJDiN9+T3pH/fOe/FN1Lm4XExjHX1HB6iLYnSKLtAVukZ9xIKT4D5IAxaeheMwC8VLaRgsevbRK4T6GUj43/Y=</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit><latexit sha1_base64="Ww9FJ6bLKv1dlMWMarWC3TPI4es=">ACtXiclVFNTxsxFPQu0EJKS4AjF4uoUrhEu20l6A3BpRIXkAgBxcnK67UTC+96a7+tiKz9iVx649/gDTnwESExkqXRzHv2+L20VNJCFD0E4crq2qfP6xutL5tfv21t3eurK4M432mlTbXKbVcyYL3QYLi16XhNE8VH6S3p40/+MeNlbq4hFnJRzmdFJIRsFLSfue5Km+c1YLyOldTRQX0CXCUOZIqlVmZ95XjuQUpqlwtq6XyYO6TqrxBe4uM6fv9ZwdjAnosnbE/jXgsqTy1UZOpnCAP3zZVdLuRL1oDvyWxAvSQucJ+3/JNOsynkBTFrh3FUwshRA5IpXrdIZXlJ2S2d8KGnBc25Hbn52Gv83SsZFtr4UwCeq87HM1tk9NXNjHta68Rl3nDCsTRyMmirIAX7OkhUSkMGjc7xJk0nIGaeUKZkT4rZlPqdwZ+0y0/hPj1l9+S/o/e7158atzfLKYxjraQ/uoi2J0iI7RH3SO+ogFP4ObIA1YeBSOwywUT6VhsOjZRS8Q6kdKON/6</latexit>

vas

<latexit sha1_base64="9wokvrVdASnFSwxmZDSpQm7FDA=">ACGHicbVA9T8MwEL2Ur1K+AkgsLBEVElOVsABbBQtjKxFaqYmK4zitVSeObKdSFfVvsPBXWBgAsXZj4bfgtB2g7UmWn9670717QcqoVLb9bZTW1jc2t8rblZ3dvf0D8/DoUfJMYOJizrhoB0gSRhPiKqoYaeCoDhgpBUM7gq9NSRCUp48qFK/Bj1EhpRjJSmuqbtBZyFchTrL/dipPpBlA/H4y5aJUgtmFW7Zk/LWgbOHFTrJ82fJwBodM2JF3KcxSRmCEpO46dKj9HQlHMyLjiZKkCA9Qj3Q0TFBMpJ9PLxtb5oJrYgL/RJlTdm/EzmKZeFRdxYW5aJWkKu0Tqaiaz+nSZopkuDZoihjluJWEZMVUkGwYiMNEBZUe7VwHwmElQ6zokNwFk9eBu5l7abmNJ1q/RZmVYZTOIMLcOAK6nAPDXABwzO8wjt8GC/Gm/FpfM1aS8Z85hj+lTH5BfxqpIw=</latexit><latexit sha1_base64="TQJ2duIbC1x/nDjLdl07jf/V9f0=">ACGHicbVDLSsNAFJ3UV62vqODGzWARXJXEjbordeOyBWMLTSiTyaQdOpmEmUmhPyG7/BP3Aj+MBtd278FidtF9r2wjCHc+7lnv8hFGpLOvbK2tb2xulbcrO7t7+wfm4dGDjFOBiYNjFouOjyRhlBNHUcVIJxERT4jbX94W+jtERGSxvxejRPiRajPaUgxUprqmZbrxyQ40h/mRshNfDbJTnPbRKkFowq1bNmhZcBvYcVOsnrR/63Hhr9syJG8Q4jQhXmCEpu7aVKC9DQlHMSF5xU0kShIeoT7oachQR6WXTy3J4rpkAhrHQjys4Zf9OZCiShUfdWViUi1pBrtK6qQqvYzyJFWE49miMGVQxbCICQZUEKzYWAOEBdVeIR4gbDSYVZ0CPbiycvAuazd1OyWXa03wKzK4BScgQtgytQB3egCRyAwSN4Ae/gw3gyXo1P42vWjLmM8fgXxmTX0v+pkg=</latexit><latexit sha1_base64="TQJ2duIbC1x/nDjLdl07jf/V9f0=">ACGHicbVDLSsNAFJ3UV62vqODGzWARXJXEjbordeOyBWMLTSiTyaQdOpmEmUmhPyG7/BP3Aj+MBtd278FidtF9r2wjCHc+7lnv8hFGpLOvbK2tb2xulbcrO7t7+wfm4dGDjFOBiYNjFouOjyRhlBNHUcVIJxERT4jbX94W+jtERGSxvxejRPiRajPaUgxUprqmZbrxyQ40h/mRshNfDbJTnPbRKkFowq1bNmhZcBvYcVOsnrR/63Hhr9syJG8Q4jQhXmCEpu7aVKC9DQlHMSF5xU0kShIeoT7oachQR6WXTy3J4rpkAhrHQjys4Zf9OZCiShUfdWViUi1pBrtK6qQqvYzyJFWE49miMGVQxbCICQZUEKzYWAOEBdVeIR4gbDSYVZ0CPbiycvAuazd1OyWXa03wKzK4BScgQtgytQB3egCRyAwSN4Ae/gw3gyXo1P42vWjLmM8fgXxmTX0v+pkg=</latexit><latexit sha1_base64="pcN9IaEh2MPGDrhs17FsZ1IhClk=">ACGHicbVC7TsMwFHXKq5RXgJHFokJiqhIWYKtgYSwSoZWaqHIcp7Xq2JHtVKqi/AYLv8LCAIi1G3+D02aAtleyfHTOvbrnjBlVGnH+bFqG5tb2zv13cbe/sHhkX18qxEJjHxsGBC9kKkCKOceJpqRnqpJCgJGemG4/tS706IVFTwJz1NSZCgIacxUgbamA7fihYpKaJ+XI/QXoUxvmkKAZonaCMYDedljMvuArcCjRBVZ2BPfMjgbOEcI0ZUqrvOqkOciQ1xYwUDT9TJEV4jIakbyBHCVFBPr+sgBeGiWAspHlcwzn7dyJHiSo9ms7SolrWSnKd1s90fBPklKeZJhwvFsUZg1rAMiYUmwZlMDEJbUeIV4hCTC2oTZMCG4yevAu+qdtyH91m+65Kow7OwDm4BC64Bm3wADrAxi8gDfwAT6tV+vd+rK+F601q5o5Bf/Kmv0Cb7eipw=</latexit>

s>Wh

<latexit sha1_base64="i0vQagZw+7zEH+Ggi2BPmMaiLeE=">ACNHicbVA9T8MwEL3wWcJXgJElokJiqhIWYEBUsDAwFInQSk2pHNdprTpxZDtIVRSJ38TCT2BngoEBECu/AaftAG1Psvz03p3u3QsSRqVynDdjbn5hcWm5tGKurq1vbFpb27eSpwITD3PGRSNAkjAaE09RxUgjEQRFASP1oH9R6PV7IiTl8Y0aJKQVoW5MQ4qR0lTbuvIDzjpyEOkv8yOkekGYyTy/8xVPsnyWs9n0r08b1tlp+IMy54G7hiUz57N0wcAqLWtF7/DcRqRWGpGy6TqJaGRKYkZy08lSRDuoy5pahijiMhWNrw6t/c107FDLvSLlT1k/05kKJKFR91ZWJSTWkHO0pqpCo9bGY2TVJEYjxaFKbMVt4sI7Q4VBCs20ABhQbVXG/eQFjpoE0dgjt58jTwDisnFfaLVfPYVQl2IU9OAXjqAKl1ADzA8wit8wKfxZLwbX8b3qHXOGM/swL8yfn4Bw9Gxaw=</latexit><latexit sha1_base64="vDZY27gxMqr2hX0poldzlAC+P0=">ACNHicbVC7TsMwFHXKq4RXgJHFokJiqhIWYEBUsDAwFInQSk2pHMdprTpxZDtIVZSv4Q9Y+AR2FmBgAMTKN+C0HaDtlSwfnXOv7rnHTxiVyrbfjNLc/MLiUnZXFldW9+wNrduJE8FJi7mjIumjyRhNCauoqRZiIinxGn7/vNAbd0RIyuNrNUhIO0LdmIYUI6WpjnXp+ZwFchDpL/MipHp+mMk8v/UT7J8ltrIZ9K9PO9YFbtqDwtOA2cMKqdP5kly/2LWO9azF3CcRiRWmCEpW46dqHaGhKYkdz0UkShPuoS1oaxigisp0Nr87hnmYCGHKhX6zgkP07kaFIFh51Z2FRTmoFOUtrpSo8amc0TlJFYjxaFKYMKg6LCGFABcGKDTRAWFDtFeIeEgrHbSpQ3AmT54G7kH1uOpcOZXaGRhVGeyAXbAPHAIauAC1IELMHgAr+ADfBqPxrvxZXyPWkvGeGYb/Cvj5xez7Lf</latexit><latexit sha1_base64="vDZY27gxMqr2hX0poldzlAC+P0=">ACNHicbVC7TsMwFHXKq4RXgJHFokJiqhIWYEBUsDAwFInQSk2pHMdprTpxZDtIVZSv4Q9Y+AR2FmBgAMTKN+C0HaDtlSwfnXOv7rnHTxiVyrbfjNLc/MLiUnZXFldW9+wNrduJE8FJi7mjIumjyRhNCauoqRZiIinxGn7/vNAbd0RIyuNrNUhIO0LdmIYUI6WpjnXp+ZwFchDpL/MipHp+mMk8v/UT7J8ltrIZ9K9PO9YFbtqDwtOA2cMKqdP5kly/2LWO9azF3CcRiRWmCEpW46dqHaGhKYkdz0UkShPuoS1oaxigisp0Nr87hnmYCGHKhX6zgkP07kaFIFh51Z2FRTmoFOUtrpSo8amc0TlJFYjxaFKYMKg6LCGFABcGKDTRAWFDtFeIeEgrHbSpQ3AmT54G7kH1uOpcOZXaGRhVGeyAXbAPHAIauAC1IELMHgAr+ADfBqPxrvxZXyPWkvGeGYb/Cvj5xez7Lf</latexit><latexit sha1_base64="e+AYpyL1QMbabPHsyaEoV6qcFs=">ACNHicbVC7TsMwFHV4lvIKMLJYVEhMVcICbBUsDAxFIrRSEyrHcVqrjh3ZDlIV5adY+A8mGBgAsfINOG0GaHsly0fn3Kt7glTRpV2nDdraXldW29tlHf3Nre2bX39u+VyCQmHhZMyG6IFGUE09TzUg3lQlISOdcHRV6p1HIhUV/E6PUxIkaMBpTDHShurbN34oWKTGiflyP0F6GMa5KoHX4s0LxapnWIhPSyKvt1wms6k4DxwK9AVbX79osfCZwlhGvMkFI910l1kCOpKWakqPuZIinCIzQgPQM5SogK8snVBTw2TARjIc3jGk7YvxM5SlTp0XSWFtWsVpKLtF6m4/MgpzNOF4uijOGNQClhHCiEqCNRsbgLCkxivEQyQR1ibougnBnT15HninzYume+s2WpdVGjVwCI7ACXDBGWiBa9AGHsDgCbyCD/BpPVv1pf1PW1dsqZA/CvrJ9fVmavng=</latexit>

Sockeye supports a range of attention models (currently limited to RNN encoders/decoders)

slide-10
SLIDE 10

Training

Recommended model training recipe:

  • Adam optimizer with learning rate scheduler
  • Learning rate reduces when dev perplexity plateaus
  • Decay resets model and optimizer parameters to best point
  • Early stopping on extended dev perplexity plateau
  • Average model parameters from best checkpoints

10

slide-11
SLIDE 11

Training

Recommended model training recipe:

  • Adam optimizer with learning rate scheduler
  • Optimizers: SGD, Nadam [Dozat, 2015], Eve [Koushik and Hayashi, 2016], etc.
  • Multi-GPU parallelization with sentence or word-based batching
  • Training resumption, sharding + serialized preprocessed data
  • Factored input [Sennrich and Haddow, 2016]
  • Learning rate reduces when dev perplexity plateaus
  • Fixed-step, inverse-square-root [Vaswani et al., 2017] and more
  • Decay resets model and optimizer parameters to best point
  • Alternatively restart optimizer (momentum) from zero [Denkowski and Neubig, 2017]
  • Early stopping on extended dev perplexity plateau
  • Alternatively track BLEU, chrF [Popović, 2015], etc., or train for set number of updates
  • Average model parameters from best checkpoints

11

slide-12
SLIDE 12

Monitoring

Monitor training with standalone TensorBoard:

  • BLEU, chrF, and perplexity curves for different model configurations
  • Easily add and track new metrics

12

slide-13
SLIDE 13

Decoding

Primary decoding features:

  • Length-normalized beam search
  • Parametrized length penalty [Wu et al., 2016]
  • Target vocabulary selection [Devlin, 2017]
  • Efficient GPU and CPU support
  • Length-based batch decoding
  • Ensemble multiple models
  • Including different architectures
  • Visualize system output
  • Attention matrices (alignments)

13

slide-14
SLIDE 14

Decoding

Visualize beam search history Adding new features? 14

slide-15
SLIDE 15

Development

15

Adding your code to Sockeye

slide-16
SLIDE 16

Starting with MXNet

Fast and scalable deep learning framework

  • Native support for parallelization of training
  • Near linear speedup with multiple GPUs

Flexible programing model

  • Imperative API (NumPy on GPUs)
  • Symbolic API (computation graphs)

Bindings for various languages (Python, C++, Scala, R, Julia, Perl) Officially supported by Amazon/AWS

  • Quick start with Amazon Deep Learning AMI

16

slide-17
SLIDE 17

MXNet Programming Models

17

Imperative

  • Like NumPy, but GPU backend

from mxnet.ndarray import * x = zeros((64, 12)) weights = zeros((128, 12)) x = FullyConnected( x, weights, num_hidden=128) pred = SoftmaxActivation(x) pred = pred.asnumpy()

Symbolic

  • Optimized computation graph, auto-diff

from mxnet.symbol import * y = Variable('y') x = Variable('x') weights = Variable('w') x = FullyConnected( x, weights, num_hidden=128) pred = SoftmaxOutput(x, y) model = Module(pred) model.fit(…) model.forward_backward(data)

slide-18
SLIDE 18

Implementation

18 Training - Symbolic:

  • Unroll models through time to maximum sequence length
  • Organize data into buckets of similar length
  • One symbolic graph per bucket with shared

memory and parameters Inference - Symbolic and Imperative:

  • Symbolic: encode source sequence
  • Imperative: iteratively generate target
  • Beam search decoder maintains & expands

k-best hypotheses at each step until <EOS>

sequence length batch size sequence length batch size sequence length batch size batch size sequence length

slide-19
SLIDE 19

Developing Sockeye

Official Amazon software on GitHub 19

slide-20
SLIDE 20

Developing Sockeye

20 Developer guidelines for reliable, understandable code:

  • Python3 with type annotations and Sphinx-style doc strings
  • Comprehensive unit and system tests
  • Peer review and code documentation
slide-21
SLIDE 21

Developing Sockeye

Public code review process—community feedback welcome! 21

slide-22
SLIDE 22

mt@

The Sockeye Neural MT Toolkit at AMTA 2018

Felix Hieber, Tobias Domhan, Michael Denkowski, David Vilar, Artem Sokolov, Ann Clifton, Matt Post github.com/awslabs/sockeye

slide-23
SLIDE 23

WMT17 News Translation Task

23 System Architecture EN→DE LV→EN FairSeq CNN 23.37 15.38 Marian RNN 25.93 16.19 Transformer 27.41 17.58 Nematus RNN 23.78 14.70 Neural Monkey RNN 13.73 10.54 OpenNMT RNN 22.69 13.85 OpenNMT-py RNN 21.95 13.55 Tensor2Tensor Transformer 26.34 17.67 Sockeye CNN 24.59 15.82 RNN 25.55 15.92 Transformer 27.50 18.06

WMT BLEU (cased)