Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, - - PowerPoint PPT Presentation

sequence to sequence learning with neural networks
SMART_READER_LITE
LIVE PREVIEW

Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, - - PowerPoint PPT Presentation

Sequence to Sequence Learning with Neural Networks Sequence-to-Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014 Introduced by Graham Neubig, NAIST 2014-11-01 1 Sequence to Sequence Learning with


slide-1
SLIDE 1

1

Sequence to Sequence Learning with Neural Networks

Sequence-to-Sequence Learning with Neural Networks

Ilya Sutskever, Oriol Vinyals, Quoc V. Le, NIPS 2014 Introduced by Graham Neubig, NAIST

2014-11-01

slide-2
SLIDE 2

2

Sequence to Sequence Learning with Neural Networks

Review: Recurrent Neural Networks

slide-3
SLIDE 3

3

Sequence to Sequence Learning with Neural Networks

Perceptron

f(∑i=1

I

wi⋅ϕi( x)) φ“A” = 1 φ“site” = 1 φ“,” = 2 φ“located”= 1 φ“in” = 1 φ“Maizuru”= 1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0

  • 3

2

y

w

slide-4
SLIDE 4

4

Sequence to Sequence Learning with Neural Networks

Neural Net

  • Combine multiple perceptrons

φ“A” = 1 φ“site” = 1 φ“,” = 2 φ“located”= 1 φ“in” = 1 φ“Maizuru”= 1 φ“Kyoto” = 1 φ“priest” = 0 φ“black” = 0

y

  • Learning of complex functions possible
slide-5
SLIDE 5

5

Sequence to Sequence Learning with Neural Networks

Recurrent Neural Nets I can eat an apple </s> I can eat an apple </s>

slide-6
SLIDE 6

6

Sequence to Sequence Learning with Neural Networks

Long Short Term Memory [Hochreiter+ 97]

  • Problem: RNNs suffer from vanishing gradient
  • Solution: Create units that decide when to activate
slide-7
SLIDE 7

7

Sequence to Sequence Learning with Neural Networks

Dialogue State Tracking with RNNs [Henderson+ 14]

f: features s: slot m: memory p: probability over goals

slide-8
SLIDE 8

8

Sequence to Sequence Learning with Neural Networks

Sequence-to-Sequence Learning with Neural Networks

slide-9
SLIDE 9

9

Sequence to Sequence Learning with Neural Networks

Task: Machine Translation

  • Mapping from input to output sentence

太郎が花子を 訪問した。 Input

Taro visited Hanako.

Output

slide-10
SLIDE 10

10

Sequence to Sequence Learning with Neural Networks

Traditional Method: Phrase-based MT

  • Translate phrases, reorder

Today I will give a lecture on machine translation .

Today 今日は、 I will give を行います a lecture on の講義 machine translation 機械翻訳 . 。 Today 今日は、 I will give を行います a lecture on の講義 machine translation 機械翻訳 . 。

今日は、機械翻訳の講義を行います。

  • Requires alignment, phrase extraction, scoring

(phrase, reordering), NP-hard decoding, tuning

slide-11
SLIDE 11

11

Sequence to Sequence Learning with Neural Networks

Proposed Method: Memorize Sequence, Generate Sequence

  • Left-to-right beam search (size 2 was largely sufficient)
  • Also can use for reranking
slide-12
SLIDE 12

12

Sequence to Sequence Learning with Neural Networks

Proposed Method: Reversal Trick

x x x

C B A

slide-13
SLIDE 13

13

Sequence to Sequence Learning with Neural Networks

Experimental Setup

  • Network details
  • 160,000/80,000 word input/output (all other UNK)
  • 4 hidden LSTM layers of 1,000 cells
  • 1,000 dimensional word representations
  • Training
  • Stochastic gradient descent
  • 8 GPUs (1 for each hidden layer, 4 for output)
  • 6,300 words per second, 10 days total
  • Data details
  • ~340M words of English-French data from WMT14
slide-14
SLIDE 14

14

Sequence to Sequence Learning with Neural Networks

Results

slide-15
SLIDE 15

15

Sequence to Sequence Learning with Neural Networks

Learned Phrase Representations

slide-16
SLIDE 16

16

Sequence to Sequence Learning with Neural Networks

Effect of Length

slide-17
SLIDE 17

17

Sequence to Sequence Learning with Neural Networks

Examples/Problems with UNK

slide-18
SLIDE 18

18

Sequence to Sequence Learning with Neural Networks

Addressing the Rare Word Problem in Neural Machine Translation [Luong+ 14]

  • Copyable model: label unk words
  • Positional all model: label word positions (i=j-d, ei fj)
  • Positional unk: label unk positions

en: The unk1 portico in unk2 … fr: Le unkn unk1 de unk2 ... en: The <unk> portico in <unk> ... fr: Le pos0 <unk> pos−1 <unk> pos1 de posn <unk> pos−1 ... en: The <unk> portico in <unk> ... fr: Le unkpos1 unkpos-1 de unkpos1 ...

slide-19
SLIDE 19

19

Sequence to Sequence Learning with Neural Networks

Results with PosUnk