Gated Orthogonal Recurrent Units: On Learning to Forget Li Jing, a - - PowerPoint PPT Presentation

gated orthogonal recurrent units on learning to forget
SMART_READER_LITE
LIVE PREVIEW

Gated Orthogonal Recurrent Units: On Learning to Forget Li Jing, a - - PowerPoint PPT Presentation

Gated Orthogonal Recurrent Units: On Learning to Forget Li Jing, a lar Glehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Solja i , Yoshua Bengio Gradient Vanishing/Explosion Problem During backpropagation through time,


slide-1
SLIDE 1

Gated Orthogonal Recurrent Units: On Learning to Forget

Li Jing, Çağlar Gülçehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

slide-2
SLIDE 2

Gradient Vanishing/Explosion Problem

  • During backpropagation through time,

hidden to hidden Jacobian matrix is multiplied multiple times.

  • Gradient vanishing/explosion makes

RNN hard to train

Li Jing

slide-3
SLIDE 3

Conventional Solution: LSTM

  • Practically, gradient clipping is required
  • slow to learn long term dependency

Li Jing

slide-4
SLIDE 4

Unitary/Orthogonal RNN

Unitary/Orthogonal matrices keep the norm of vectors:

By enforcing hidden to hidden transition matrix to be unitary/

  • rthogonal, no matter how many time steps are propagated,

the norm of the gradient will stay the same

Li Jing

  • Restricted-capacity Unitary Matrix Parametrization (Arjovsky, ICML 2016)
  • Full-capacity Unitary Matrix by projection (Wisdom, NIPS 2016)
  • Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN (Jing, ICML 2017)
  • Orthogonal Matrix Parametrization by reflection (Mhammedi, ICML 2017)
  • Orthogonal Matrix by regularization (Vorontsov, ICML 2017)
slide-5
SLIDE 5

Limitation for basic Orthogonal RNN

  • No forgetting mechanism
  • Limited Memory size
slide-6
SLIDE 6

Applying Gated System to Orthogonal RNN

z 1-z

h

r

IN OUT

modReLU U Wx

Gated Orthogonal Recurrent Unit Long Term Dependency Forgetting Unitary/Orthogonal Matrices Gated Mechanism

slide-7
SLIDE 7

Experiment results

  • Copying Task
  • Parenthesis Task

Li Jing

  • Denoise Task

Synthetic Tasks: GORU is the only one succeeding in all tasks

slide-8
SLIDE 8

Experiment results

  • Question Answering Task

Li Jing

  • Speech Task

Real Tasks: GORU outperforms all other models

slide-9
SLIDE 9

Thank you