Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural - - PowerPoint PPT Presentation

deep learning theory and practice
SMART_READER_LITE
LIVE PREVIEW

Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural - - PowerPoint PPT Presentation

Deep Learning: Theory and Practice 30-04-2019 Recurrent Neural Networks Introduction The standard DNN/CNN paradigms (x,y) - ordered pair of data vectors/images (x) and target (y) Moving to sequence data (x(t),y(t)) where this


slide-1
SLIDE 1

Deep Learning: Theory and Practice

30-04-2019

Recurrent Neural Networks

slide-2
SLIDE 2

Introduction

❖ The standard DNN/CNN paradigms ❖ (x,y) - ordered pair of data vectors/images (x) and

target (y)

❖ Moving to sequence data ❖ (x(t),y(t)) where this could be sequence to sequence

mapping task.

❖ (x(t),y) where this could be a sequence to vector

mapping task.

slide-3
SLIDE 3

Introduction

❖ Difference between CNNs/DNNs ❖ (x(t),y(t)) where this could be sequence to sequence

mapping task.

❖ Input features / output targets are correlated in time. ❖ Unlike standard models where each pair is

independent.

❖ Need to model dependencies in the sequence over

time.

slide-4
SLIDE 4

Introduction to Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

slide-5
SLIDE 5

Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

slide-6
SLIDE 6

Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

slide-7
SLIDE 7

Back Propagation in RNNs

Model Parameters Gradient Descent

slide-8
SLIDE 8

Recurrent Networks

slide-9
SLIDE 9

Back Propagation Through Time

slide-10
SLIDE 10
slide-11
SLIDE 11

Back Propagation Through Time

slide-12
SLIDE 12

Standard Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

slide-13
SLIDE 13

Other Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

Teacher Forcing Networks

slide-14
SLIDE 14

Recurrent Networks

“Deep Learning”, Ian Goodfellow, Yoshua Bengio, Aaron Courville

Teacher Forcing Networks

slide-15
SLIDE 15

Recurrent Networks

Multiple Input Single Output

slide-16
SLIDE 16

Recurrent Networks

Single Input Multiple Output

slide-17
SLIDE 17

Recurrent Networks

Bi-directional Networks

slide-18
SLIDE 18

Recurrent Networks

Sequence to Sequence Mapping Networks

slide-19
SLIDE 19
slide-20
SLIDE 20

Long-term Dependency Issues

slide-21
SLIDE 21

Vanishing/Exploding Gradients

❖ Gradients either vanish or explode ❖ Initial frames may not contribute to gradient

computations or may contribute too much.

slide-22
SLIDE 22

Long-Short Term Memory

slide-23
SLIDE 23

LSTM Cell

Input Gate Forget Gate Cell Output Gate LSTM output

f - sigmoid function g, h - tanh function

slide-24
SLIDE 24

Long Short Term Memory Networks

slide-25
SLIDE 25

Gated Recurrent Units (GRU)

slide-26
SLIDE 26

Attention in LSTM Networks

❖ Attentions allows a mechanism to add relevance ❖ Certain regions of the audio have more importance

than the rest for the task at hand.

slide-27
SLIDE 27
slide-28
SLIDE 28

Encoder - Decoder Networks with Attention

slide-29
SLIDE 29

Attention Models

slide-30
SLIDE 30

Attention - Speech Example

From our lab [part of ICASSP 2019 paper].

slide-31
SLIDE 31

Language Recognition Evaluation

slide-32
SLIDE 32

End-to-end model using GRUs and Attention

slide-33
SLIDE 33

Proposed End-to-End Language Recognition Model

slide-34
SLIDE 34

Proposed End-to-End Language Recognition Model

slide-35
SLIDE 35

Proposed End-to-End Language Recognition Model

slide-36
SLIDE 36

Language Recognition Evaluation

0-3s : O...One muscle at all, it was terrible 3s-4s : .... ah .... ah .... 4s - 9s : I couldn't scream, I couldn't shout, I couldn't even move my arms up, or my legs 9s -11s : I was trying me hardest, I was really really panicking.

Attention Weight

Bharat Padi, et al. “End-to-end language recognition using hierarchical gated recurrent networks”, under review 2018.

We proposed the attention model - Attention weighs th importance of each short-term segment feature for the task. State-of-art models use the input sequence directly.

slide-37
SLIDE 37

Language Recognition Evaluation

slide-38
SLIDE 38

Language Recognition Evaluation