CS5242 Neural Networks and Deep Learning
Lecture 09: RNN Applications II
Wei WANG TA: Yao SHU, Juncheng LIU, Qingpeng Cai cs5242@comp.nus.edu.sg
CS5242 Neural Networks and Deep Learning Lecture 09: RNN - - PowerPoint PPT Presentation
CS5242 Neural Networks and Deep Learning Lecture 09: RNN Applications II Wei WANG TA: Yao SHU, Juncheng LIU, Qingpeng Cai cs5242@comp.nus.edu.sg Recap Language modelling Training Model the joint probability Model the conditional
Wei WANG TA: Yao SHU, Juncheng LIU, Qingpeng Cai cs5242@comp.nus.edu.sg
CS5242 2
CS5242 3
Language modelling Sentiment analysis Image caption Machine translation, Question answering
CS5242 4
Image source: https://karpathy.github.io/2015/05/21/rnn-effectiveness/
Θ
<𝑧1,𝑧2,…,𝑧𝑛> 𝑚𝑝𝑄 𝑧1, 𝑧2, … , 𝑧𝑛|𝑦1, 𝑦2, … , 𝑦𝑜
CS5242 5
RNN RNN RNN RNN RNN RNN RNN A B C END (START) W X Y Z RNN W X Y Z END Encoder Decoder S
CS5242 6
CS5242 8
CS5242 9
新加坡 地铁 不 稳定
Singapore MRT is not stable
CS5242 10
Image from: https://distill.pub/2016/augmented-rnns/ Encoder RNN Decoder RNN Singapore MRT is not 新加坡 地铁 不
CS5242 11
Image from: https://distill.pub/2016/augmented-rnns/ demo 新加坡 地铁 不 Singapore MRT is not
𝑡𝑢
𝑡𝑢 = tanh 𝑠
𝑢 ∘ 𝑡𝑢−1, 𝑧𝑢−1𝑋 𝑓, 𝑑𝑢 W
𝑢 = 𝜏 𝑡𝑢−1, 𝑧𝑢−1𝑋 𝑓, 𝑑𝑢 𝑋 𝑠
𝑓, 𝑑𝑢 𝑋 𝑨
𝑜
𝛽𝑢𝑘ℎ𝑘
𝑈tanh(𝑡𝑢−1𝑋 𝑏 + ℎ𝑘𝑉𝑏)
RNN RNN RNN RNN RNN Encoder Attention Decoder 𝑧𝑢 𝑡𝑢 ℎ1 ℎ2
CS5242 12
𝑡𝑢−1 𝑧𝑢−1 𝑧𝑢−1 𝑋
𝑓
ct
CS5242 13
Parameters: {U, W}
𝑢 ∘ 𝑡𝑢−1, 𝑧𝑢−1𝑋 𝑓, 𝑑𝑢 W
𝑢 = 𝜏 𝑡𝑢−1, 𝑧𝑢−1𝑋 𝑓, 𝑑𝑢 𝑋 𝑠
𝑓, 𝑑𝑢 𝑋 𝑨
𝑓}
CS5242 14
h1 h2 h3 s0 s1
Attention
Ct
CS5242 15
Repeat self-attention modelling to get better word embedding Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 16
Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 17
Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 18
Represent a word by considering the words in the context Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 19
CS5242 20
Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 21
Image source: https://jalammar.github.io/illustrated-transformer/ One row per word
CS5242 22
Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 23
Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 24
Image source: https://jalammar.github.io/illustrated-transformer/
CS5242 25
Source from [21]
CS5242 26
Source from [22]
CS5242 27
Source from [23]
CS5242 28
1. Extract representation of question and passage (context) 2. Combine question and context
3. Generate the prediction
word feature
CS5242 29
RNN RNN RNN RNN RNN RNN
Softmax
Question Passage / Context
CS5242 30
CS5242 31
2015
Networks for Visual Recognition and Description, arXiv:1411.4389 / CVPR 2015
arXiv:1506.03340 / NIPS 2015
Dynamic Memory Networks for Natural Language Processing, arXiv:1506.07285
CS5242 32
CS5242 33
RNN RNN RNN RNN RNN Attention 𝑧𝑢 𝑡𝑢 ℎ1 ℎ2 𝑡𝑢−1 𝑧𝑢−1 𝑧𝑢−1 ct RNN ℎ3 ℎ4