CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural - - PowerPoint PPT Presentation

cs480 680 lecture 18 july 8 2019
SMART_READER_LITE
LIVE PREVIEW

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural - - PowerPoint PPT Presentation

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Variable length data Traditional feed forward neural networks can only handle fixed


slide-1
SLIDE 1

CS480/680 Lecture 18: July 8, 2019

Recurrent and Recursive Neural Networks [GBC] Chap. 10

CS480/680 Spring 2019 Pascal Poupart 1 University of Waterloo

slide-2
SLIDE 2

Variable length data

  • Traditional feed forward neural networks can
  • nly handle fixed length data
  • Variable length data (e.g., sequences, time-

series, spatial data) leads to a variable # of parameters

  • Solutions:

– Recurrent neural networks – Recursive neural networks

CS480/680 Spring 2019 Pascal Poupart 2 University of Waterloo

slide-3
SLIDE 3

Recurrent Neural Network (RNN)

  • In RNNs, outputs can be fed back to the

network as inputs, creating a recurrent structure that can be unrolled to handle varying length data.

CS480/680 Spring 2019 Pascal Poupart 3 University of Waterloo

slide-4
SLIDE 4

Training

  • Recurrent neural networks are trained by

backpropagation on the unrolled network

– E.g. backpropagation through time

  • Weight sharing:

– Combine gradients of shared weights into a single gradient

  • Challenges:

– Gradient vanishing (and explosion) – Long range memory – Prediction drift

CS480/680 Spring 2019 Pascal Poupart 4 University of Waterloo

slide-5
SLIDE 5

RNN for belief monitoring

  • HMM can be simulated and generalized by a

RNN

CS480/680 Spring 2019 Pascal Poupart 5 University of Waterloo

slide-6
SLIDE 6

Bi-Directional RNN

  • We can combine past and future evidence in

separate chains

CS480/680 Spring 2019 Pascal Poupart 6 University of Waterloo

slide-7
SLIDE 7

Encoder-Decoder Model

  • Also known as

sequence2sequence

– !(#): %&' input – ((#): %&' output – ): context (embedding)

  • Usage:

– Machine translation – Question answering – Dialog

CS480/680 Spring 2019 Pascal Poupart 7 University of Waterloo

slide-8
SLIDE 8

Machine Translation

  • Cho, van Merrienboer, Gulcehre, Bahdanau, Bougares,

Schwenk, Bengio (2014) Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

CS480/680 Spring 2019 Pascal Poupart 8 University of Waterloo

slide-9
SLIDE 9

Long Short Term Memory (LSTM)

  • Special gated structure to

control memorization and forgetting in RNNs

  • Mitigate gradient vanishing
  • Facilitate long term memory

CS480/680 Spring 2019 Pascal Poupart 9 University of Waterloo

slide-10
SLIDE 10

Unrolled LSTM

  • Picture

CS480/680 Spring 2019 Pascal Poupart 10 University of Waterloo

slide-11
SLIDE 11

LSTM cell in practice

  • Adjustments:

– Hidden state ℎ" calledcell state #" – Output $" called hidden state ℎ"

  • Update equations

Input gate: %" = '() ** ̅ ," + )(.*)ℎ"01) Forget gate: 2

" = '() *3

̅ ," + )(.3)ℎ"01) Output gate: 4" = '() *5 ̅ ," + )(.5)ℎ"01) Process input: ̃ #" = tanh() * ̃

;

̅ ," + )(. ̃

;)ℎ"01)

Cell update: #" = 2

" ∗ #"01 + %" ∗ ̃

#" Output: $" = ℎ" = 4" ∗ tanh(#")

CS480/680 Spring 2019 Pascal Poupart 11 University of Waterloo

slide-12
SLIDE 12

Gated Recurrent Unit (GRU)

  • Simplified LSTM

– No cell state – Two gates (instead of three) – Fewer weights

  • Update equations

Reset gate: !

" = $(& '(

̅ *" + &(,()ℎ"/0) Update gate: 1" = $(& '2 ̅ *" + &(,2)ℎ"/0) Process input: 3 ℎ" = tanh & '8

,

̅ *" + !

" ∗ & ,8 , ℎ"/0

Hidden state update: ℎ" = (1 − 1") ∗ ℎ"/0 + 1" ∗ 3 ℎ" Output: <" = ℎ"

CS480/680 Spring 2019 Pascal Poupart 12 University of Waterloo

slide-13
SLIDE 13

Attention

  • Mechanism for alignment in machine translation, image

captioning, etc.

  • Attention in machine translation: align each output word

with relevant input words by computing a softmax of the inputs

– Context vector !": weighted sum of input encodings ℎ$

!" = ∑$ '"$ℎ$

– Where '"$ is an alignment weight between input encoding ℎ$ and output encoding (" '"$ =

)*+ ,-"./01/2(4567,9:) ∑:< )*+(,-"./01/2(4567,9:<)) (softmax)

– Alignment example: '=>?@AB@C ("DE, ℎ$ = ("DE

F ℎ$

CS480/680 Spring 2019 Pascal Poupart 13 University of Waterloo

slide-14
SLIDE 14

Attention

  • Picture

CS480/680 Spring 2019 Pascal Poupart 14 University of Waterloo

slide-15
SLIDE 15

Machine Translation with Bidirectional RNNs, LSTM units and attention

  • Bahdanau, Cho, Bengio (ICLR-2015)
  • Bleu: BiLingual Evaluation Understudy

– Percentage of translated words that appear in ground truth

CS480/680 Spring 2019 Pascal Poupart 15

RNNsearch: with attention RNNenc: no attention

University of Waterloo

slide-16
SLIDE 16

Alignment example

  • Bahdanau, Cho, Bengio (ICLR-2015)

CS480/680 Spring 2019 Pascal Poupart 16 University of Waterloo

slide-17
SLIDE 17

Recursive Neural network

  • Recursive neural networks

generalize recurrent neural networks from chains to trees.

  • Weight sharing allows

trees of different sizes to fit variable length data.

  • What structure should

the tree follow?

CS480/680 Spring 2019 Pascal Poupart 17 University of Waterloo

slide-18
SLIDE 18

Example: Semantic Parsing

  • Use a parse tree or dependency graph as the

structure of the recursive neural network

  • Example:

CS480/680 Spring 2019 Pascal Poupart 18 University of Waterloo