Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - - PowerPoint PPT Presentation

recurrent neural networks
SMART_READER_LITE
LIVE PREVIEW

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview - - PowerPoint PPT Presentation

Recurrent Neural Networks CS 6956: Deep Learning for NLP Overview 1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing


slide-1
SLIDE 1

CS 6956: Deep Learning for NLP

Recurrent Neural Networks

slide-2
SLIDE 2

Overview

1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units

1

slide-3
SLIDE 3

Overview

1. Modeling sequences 2. Recurrent neural networks: An abstraction 3. Usage patterns for RNNs 4. BiDirectional RNNs 5. A concrete example: The Elman RNN 6. The vanishing gradient problem 7. Long short-term memory units

2

slide-4
SLIDE 4

Recurrent neural networks

  • First introduced by Elman 1990
  • Provides a mechanism for representing sequences of

arbitrary length into vectors that encode the sequential information

  • Currently, perhaps one of the most commonly used

tool in the deep learning toolkit for NLP applications

slide-5
SLIDE 5

The RNN abstraction

A high level overview that doesn’t go into details

4

An RNN cell

Input Output

An RNN cell is a unit

  • f differentiable

compute that maps inputs to outputs

slide-6
SLIDE 6

The RNN abstraction

A high level overview that doesn’t go into details

5

An RNN cell

Input Output

An RNN cell is a unit

  • f differentiable

compute that maps inputs to outputs So far, no way to build a sequence of such cells

slide-7
SLIDE 7

The RNN abstraction

A high level overview that doesn’t go into details

6

An RNN cell

Input Output Recurrent input

To allow the ability to compose these cells, they take a recurrent input from a previous such cell

slide-8
SLIDE 8

The RNN abstraction

A high level overview that doesn’t go into details

7

An RNN cell

Input Output Recurrent output Recurrent input

To allow the ability to compose these cells, they take a recurrent input from a previous such cell In addition to the output, they also produce a recurrent output that can serve as a memory of past states for the next such cell

slide-9
SLIDE 9

The RNN abstraction

A high level overview that doesn’t go into details

8

Conceptually two operations Using the input and the recurrent input (also called the previous cell state), compute

  • 1. The next cell state
  • 2. The output
slide-10
SLIDE 10

The RNN abstraction: A simple example

9

John lives in Salt Lake City This template is unrolled for each input

slide-11
SLIDE 11

The RNN abstraction: A simple example

10

John lives in Salt Lake City John Initial state Output 1 This computation graph is used here

slide-12
SLIDE 12

The RNN abstraction: A simple example

11

John lives in Salt Lake City John Initial state lives Output 1 Output 2 This computation graph is used here

slide-13
SLIDE 13

The RNN abstraction: A simple example

12

John lives in Salt Lake City John Initial state lives in Output 1 Output 2 Output 3 This computation graph is used here

slide-14
SLIDE 14

The RNN abstraction: A simple example

13

John lives in Salt Lake City John Initial state lives in Salt Output 1 Output 2 Output 3 Output 4 This computation graph is used here

slide-15
SLIDE 15

The RNN abstraction: A simple example

14

John lives in Salt Lake City John Initial state lives in Salt Lake Output 1 Output 2 Output 3 Output 4 Output 5 This computation graph is used here

slide-16
SLIDE 16

The RNN abstraction: A simple example

15

John lives in Salt Lake City John Initial state lives in Salt Lake City Output 1 Output 2 Output 3 Output 4 Output 5 Output 6 This computation graph is used here

slide-17
SLIDE 17

The RNN abstraction

16

An RNN cell

Input Output Recurrent output Recurrent input

Sometimes this is represented as a “neural network with a loop”. But really, when unrolled, there are no loops. Just a big feedforward network.

slide-18
SLIDE 18

An abstract RNN :Notation

  • Inputs to cells: 𝐲"at the 𝑢$% step

– These are vectors

  • Cell states (i.e. recurrent inputs and outputs): 𝐭"at the 𝑢$% step

– These are also vectors

  • Outputs: 𝐳"at the 𝑢$% step

– These are also vectors

  • At each step:

– Compute the next cell state: 𝐭"() = R(𝐲", 𝒕") – Compute the output: 𝒛" = O(𝐭"())

17

slide-19
SLIDE 19

An abstract RNN :Notation

  • Inputs to cells: 𝐲"at the 𝑢$% step

– These are vectors

  • Cell states (i.e. recurrent inputs and outputs): 𝐭"at the 𝑢$% step

– These are also vectors

  • Outputs: 𝐳"at the 𝑢$% step

– These are also vectors

  • At each step:

– Compute the next cell state: 𝐭"() = R(𝐲", 𝒕") – Compute the output: 𝒛" = O(𝐭"())

18

slide-20
SLIDE 20

An abstract RNN :Notation

  • Inputs to cells: 𝐲"at the 𝑢$% step

– These are vectors

  • Cell states (i.e. recurrent inputs and outputs): 𝐭"at the 𝑢$% step

– These are also vectors

  • Outputs: 𝐳"at the 𝑢$% step

– These are also vectors

  • At each step:

– Compute the next cell state: 𝐭"() = R(𝐲", 𝒕") – Compute the output: 𝒛" = O(𝐭"())

19

slide-21
SLIDE 21

An abstract RNN :Notation

  • Inputs to cells: 𝐲"at the 𝑢$% step

– These are vectors

  • Cell states (i.e. recurrent inputs and outputs): 𝐭"at the 𝑢$% step

– These are also vectors

  • Outputs: 𝐳"at the 𝑢$% step

– These are also vectors

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

20

slide-22
SLIDE 22

An abstract RNN :Notation

  • Inputs to cells: 𝐲"at the 𝑢$% step

– These are vectors

  • Cell states (i.e. recurrent inputs and outputs): 𝐭"at the 𝑢$% step

– These are also vectors

  • Outputs: 𝐳"at the 𝑢$% step

– These are also vectors

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

21

Both these functions can be parameterized. That is, they can be neural networks whose parameters are trained.

slide-23
SLIDE 23

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲))

22

slide-24
SLIDE 24

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲)) – 𝐭4 = R(𝐭), 𝐲4) = R(R 𝐭3, 𝐲) , 𝐲4)

23

slide-25
SLIDE 25

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲)) – 𝐭4 = R(𝐭), 𝐲4) = R(R 𝐭3, 𝐲) , 𝐲4)

24

Encodes the sequence upto t=2 into a single vector

slide-26
SLIDE 26

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲)) – 𝐭4 = R(𝐭), 𝐲4) = R(R 𝐭3, 𝐲) , 𝐲4) – 𝐭6 = R(𝐭4, 𝐲6) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6)

25

slide-27
SLIDE 27

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲)) – 𝐭4 = R(𝐭), 𝐲4) = R(R 𝐭3, 𝐲) , 𝐲4) – 𝐭6 = R(𝐭4, 𝐲6) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6)

26

Encodes the sequence upto t=3 into a single vector

slide-28
SLIDE 28

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲)) – 𝐭4 = R(𝐭), 𝐲4) = R(R 𝐭3, 𝐲) , 𝐲4) – 𝐭6 = R(𝐭4, 𝐲6) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6) – 𝐭7 = R(𝐭6, 𝐲7) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6), 𝐲7)

27

slide-29
SLIDE 29

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲)) – 𝐭4 = R(𝐭), 𝐲4) = R(R 𝐭3, 𝐲) , 𝐲4) – 𝐭6 = R(𝐭4, 𝐲6) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6) – 𝐭7 = R(𝐭6, 𝐲7) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6), 𝐲7)

28

Encodes the sequence upto t=4 into a single vector

slide-30
SLIDE 30

What does unrolling the RNN do?

  • At each step:

– Compute the next cell state: 𝐭" = R(𝐭"2), 𝐲") – Compute the output: 𝒛" = O(𝐭")

  • We can write this as:

– 𝐭) = R(𝐭3, 𝐲)) – 𝐭4 = R(𝐭), 𝐲4) = R(R 𝐭3, 𝐲) , 𝐲4) – 𝐭6 = R(𝐭4, 𝐲6) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6) – 𝐭7 = R(𝐭6, 𝐲7) = R R R(𝐭3, 𝐲) , 𝐲4 , 𝐲6), 𝐲7) … and so on

29

Encodes the sequence upto t=4 into a single vector