CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural - PowerPoint PPT Presentation

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1

Variable length data • Traditional feed forward neural networks can only handle fixed length data • Variable length data (e.g., sequences, time- series, spatial data) leads to a variable # of parameters • Solutions: – Recurrent neural networks – Recursive neural networks University of Waterloo CS480/680 Spring 2019 Pascal Poupart 2

Recurrent Neural Network (RNN) • In RNNs, outputs can be fed back to the network as inputs, creating a recurrent structure that can be unrolled to handle varying length data. University of Waterloo CS480/680 Spring 2019 Pascal Poupart 3

Training • Recurrent neural networks are trained by backpropagation on the unrolled network – E.g. backpropagation through time • Weight sharing: – Combine gradients of shared weights into a single gradient • Challenges: – Gradient vanishing (and explosion) – Long range memory – Prediction drift University of Waterloo CS480/680 Spring 2019 Pascal Poupart 4

RNN for belief monitoring • HMM can be simulated and generalized by a RNN University of Waterloo CS480/680 Spring 2019 Pascal Poupart 5

Bi-Directional RNN • We can combine past and future evidence in separate chains University of Waterloo CS480/680 Spring 2019 Pascal Poupart 6

Encoder-Decoder Model • Also known as sequence2sequence – ! (#) : % &' input – ( (#) : % &' output – ) : context (embedding) • Usage: – Machine translation – Question answering – Dialog University of Waterloo CS480/680 Spring 2019 Pascal Poupart 7

Machine Translation • Cho, van Merrienboer, Gulcehre, Bahdanau, Bougares, Schwenk, Bengio (2014) Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation University of Waterloo CS480/680 Spring 2019 Pascal Poupart 8

Long Short Term Memory (LSTM) • Special gated structure to control memorization and forgetting in RNNs • Mitigate gradient vanishing • Facilitate long term memory University of Waterloo CS480/680 Spring 2019 Pascal Poupart 9

Unrolled LSTM • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 10

̅ ̅ ̅ ̅ LSTM cell in practice • Adjustments: – Hidden state ℎ " calledcell state # " – Output $ " called hidden state ℎ " • Update equations Input gate: % " = '() ** , " + ) (.*) ℎ "01 ) " = '() *3 , " + ) (.3) ℎ "01 ) Forget gate: 2 Output gate: 4 " = '() *5 , " + ) (.5) ℎ "01 ) # " = tanh() * ̃ ; , " + ) (. ̃ ;) ℎ "01 ) Process input: ̃ Cell update: # " = 2 " ∗ # "01 + % " ∗ ̃ # " Output: $ " = ℎ " = 4 " ∗ tanh(# " ) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 11

̅ ̅ ̅ Gated Recurrent Unit (GRU) • Simplified LSTM – No cell state – Two gates (instead of three) – Fewer weights • Update equations " = $(& '( * " + & (,() ℎ "/0 ) Reset gate: ! Update gate: 1 " = $(& '2 * " + & (,2) ℎ "/0 ) , ℎ "/0 Process input: 3 ℎ " = tanh & '8 " ∗ & ,8 , * " + ! Hidden state update: ℎ " = (1 − 1 " ) ∗ ℎ "/0 + 1 " ∗ 3 ℎ " Output: < " = ℎ " University of Waterloo CS480/680 Spring 2019 Pascal Poupart 12

Attention • Mechanism for alignment in machine translation, image captioning, etc. • Attention in machine translation: a lign each output word with relevant input words by computing a softmax of the inputs – Context vector ! " : weighted sum of input encodings ℎ $ ! " = ∑ $ ' "$ ℎ $ – Where ' "$ is an alignment weight between input encoding ℎ $ and output encoding ( " )*+ ,-"./01/2(4 567 ,9 : ) ' "$ = ∑ :< )*+(,-"./01/2(4 567 ,9 :< )) (softmax) F ℎ $ – Alignment example: '=>?@AB@C ( "DE , ℎ $ = ( "DE University of Waterloo CS480/680 Spring 2019 Pascal Poupart 13

Attention • Picture University of Waterloo CS480/680 Spring 2019 Pascal Poupart 14

Machine Translation with Bidirectional RNNs, LSTM units and attention • Bahdanau, Cho, Bengio (ICLR-2015) RNNsearch: with attention RNNenc: no attention • Bleu: BiLingual Evaluation Understudy – Percentage of translated words that appear in ground truth University of Waterloo CS480/680 Spring 2019 Pascal Poupart 15

Alignment example • Bahdanau, Cho, Bengio (ICLR-2015) University of Waterloo CS480/680 Spring 2019 Pascal Poupart 16

Recursive Neural network • Recursive neural networks generalize recurrent neural networks from chains to trees. • Weight sharing allows trees of different sizes to fit variable length data. • What structure should the tree follow? University of Waterloo CS480/680 Spring 2019 Pascal Poupart 17

Example: Semantic Parsing • Use a parse tree or dependency graph as the structure of the recursive neural network • Example: University of Waterloo CS480/680 Spring 2019 Pascal Poupart 18

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural - PowerPoint PPT Presentation

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Variable length data Traditional feed forward neural networks can only handle fixed

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Tacotron: End-to-End TTS Tacotron [Wang 2017]: Neural Vocoder Convert spectrogram to

Introducing Precautionary Behavior by Temporal Diversion of Voter Attention from Casting to

Differential Attention to Attributes in Utility-Theoretic Choice Models Trudy Ann Cameron J.R.

Right now, we pay attention to only 2 things 1) Scary and uncertain news And anything

1 Planning Conside r ations F unding is c o ming o ut Co uld b e re c e iving ne w q uic

A Mathematical View of Attention Models in Deep Learning Shuiwang Ji, Yaochen Xie Department of

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Investigating positional information in the Transformer Group 9 Outline Background &

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural - PowerPoint PPT Presentation

CS480/680 Lecture 18: July 8, 2019 Recurrent and Recursive Neural Networks [GBC] Chap. 10 University of Waterloo CS480/680 Spring 2019 Pascal Poupart 1 Variable length data Traditional feed forward neural networks can only handle fixed

CEE 680 Lecture #2 1/22/2020 1 CEE 680 Lecture #2 1/22/2020 2 CEE 680 Lecture #2

CS480/680 Lecture 24: July 29, 2019 Gradient Boosting, Bagging, Decision Forest [RN] Sec. 18.10,

CS480/680 Lecture 22: July 22, 2019 Ensemble Learning [RN] Sec. 18.10, [M] Sec. 16.2.5, [B]

CS480/680 Lecture 19: July 10, 2019 Attention and Transformer Networks [Vaswani et al.,

CS480/680 Lecture 9: June 5, 2019 Perceptrons, Neural Networks [D] Chapt. 4, [HTF] Chapt. 11,

CS480/680 Lecture 4: May 15, 2019 Statistical Learning [RN]: Sec 20.1, 20.2, [M]: Sec. 2.2, 3.2

CS480/680 Lecture 15: June 26, 2019 Deep Neural Networks [GBC] Chap. 6, 7, 8 University of

CS480/680 Machine Learning Lecture 1: May 6 th , 2019 Course Introduction Pascal Poupart

CS480/680 Lecture 2: May 8 th , 2019 Nearest Neighbour [RN] Sec. 18.8.1, [HTF] Sec. 2.3.2, [D]

CS480/680 Lecture 12: June 17, 2019 Gaussian Processes [B] Section 6.4 [M] Chap. 15 [HTF] Sec.

CS480/680 Lecture 8: June 3, 2019 Classification by Logistic Regression, Generalized linear

CS480/680 Machine Learning Lecture 3: May 13, 2019 Linear Regression [RN] Sec. 18.6.1, [HTF]

CS480/680 Lecture 7: May 29, 2019 Classification with Mixture of Gaussians [B] Sections 4.2,

CS480/680 Lecture 11: June 12, 2019 Kernel methods [D] Chap. 11 [B] Sec. 6.1, 6.2 [M] Sec.

CS480/680 Lecture 14: June 24, 2019 Support Vector Machines (continued) [B] Sec. 7.1 [D] Sec.

CS480/680 Machine Learning Lecture 8: January 30 th , 2020 Graphical Models Zahra Sheikhbahaee

Tacotron: End-to-End TTS Tacotron [Wang 2017]: Neural Vocoder Convert spectrogram to

Introducing Precautionary Behavior by Temporal Diversion of Voter Attention from Casting to

Differential Attention to Attributes in Utility-Theoretic Choice Models Trudy Ann Cameron J.R.

Right now, we pay attention to only 2 things 1) Scary and uncertain news And anything

1 Planning Conside r ations F unding is c o ming o ut Co uld b e re c e iving ne w q uic

A Mathematical View of Attention Models in Deep Learning Shuiwang Ji, Yaochen Xie Department of

Natural Language Processing with Deep Learning Sequence-to-sequence Models with Attention Navid

Investigating positional information in the Transformer Group 9 Outline Background &amp;

Investigating positional information in the Transformer Group 9 Outline Background &