cs480 680 lecture 18 july 8 2019
play

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural - PowerPoint PPT Presentation

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Variable length data Traditional feed forward neural networks can only handle fixed


  1. CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

  2. Variable length data • Traditional feed forward neural networks can only handle fixed length data • Variable length data (e.g., sequences, time- series, spatial data) leads to a variable # of parameters • Solutions: – Recurrent neural networks – Recursive neural networks University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

  3. Recurrent Neural Network (RNN) • In RNNs, outputs can be fed back to the network as inputs, creating a recurrent structure that can be unrolled to handle varying length data. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

  4. Training • Recurrent neural networks are trained by backpropagation on the unrolled network – E.g. backpropagation through time • Weight sharing: – Combine gradients of shared weights into a single gradient • Challenges: – Gradient vanishing (and explosion) – Long range memory – Prediction drift University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

  5. RNN for belief monitoring • HMM can be simulated and generalized by a RNN University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

  6. Bi-Directional RNN • We can combine past and future evidence in separate chains University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

  7. Encoder-Decoder Model • Also known as sequence2sequence – ! (#) : % &' input – ( (#) : % &' output – ) : context (embedding) • Usage: – Machine translation – Question answering – Dialog University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

  8. Machine Translation • Cho, van Merrienboer, Gulcehre, Bahdanau, Bougares, Schwenk, Bengio (2014) Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

  9. Long Short Term Memory (LSTM) • Special gated structure to control memorization and forgetting in RNNs • Mitigate gradient vanishing • Facilitate long term memory University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

  10. Unrolled LSTM • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

  11. ̅ ̅ ̅ ̅ LSTM cell in practice • Adjustments: – Hidden state ℎ " calledcell state # " – Output $ " called hidden state ℎ " • Update equations Input gate: % " = '() ** , " + ) (.*) ℎ "01 ) " = '() *3 , " + ) (.3) ℎ "01 ) Forget gate: 2 Output gate: 4 " = '() *5 , " + ) (.5) ℎ "01 ) # " = tanh() * ̃ ; , " + ) (. ̃ ;) ℎ "01 ) Process input: ̃ Cell update: # " = 2 " ∗ # "01 + % " ∗ ̃ # " Output: $ " = ℎ " = 4 " ∗ tanh(# " ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

  12. ̅ ̅ ̅ Gated Recurrent Unit (GRU) • Simplified LSTM – No cell state – Two gates (instead of three) – Fewer weights • Update equations " = $(& '( * " + & (,() ℎ "/0 ) Reset gate: ! Update gate: 1 " = $(& '2 * " + & (,2) ℎ "/0 ) , ℎ "/0 Process input: 3 ℎ " = tanh & '8 " ∗ & ,8 , * " + ! Hidden state update: ℎ " = (1 − 1 " ) ∗ ℎ "/0 + 1 " ∗ 3 ℎ " Output: < " = ℎ " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

  13. Attention • Mechanism for alignment in machine translation, image captioning, etc. • Attention in machine translation: a lign each output word with relevant input words by computing a softmax of the inputs – Context vector ! " : weighted sum of input encodings ℎ $ ! " = ∑ $ ' "$ ℎ $ – Where ' "$ is an alignment weight between input encoding ℎ $ and output encoding ( " )*+ ,-"./01/2(4 567 ,9 : ) ' "$ = ∑ :< )*+(,-"./01/2(4 567 ,9 :< )) (softmax) F ℎ $ – Alignment example: '=>?@AB@C ( "DE , ℎ $ = ( "DE University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

  14. Attention • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

  15. Machine Translation with Bidirectional RNNs, LSTM units and attention • Bahdanau, Cho, Bengio (ICLR-2015) RNNsearch: with attention RNNenc: no attention • Bleu: BiLingual Evaluation Understudy – Percentage of translated words that appear in ground truth University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

  16. Alignment example • Bahdanau, Cho, Bengio (ICLR-2015) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

  17. Recursive Neural network • Recursive neural networks generalize recurrent neural networks from chains to trees. • Weight sharing allows trees of different sizes to fit variable length data. • What structure should the tree follow? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

  18. Example: Semantic Parsing • Use a parse tree or dependency graph as the structure of the recursive neural network • Example: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend