Gated Orthogonal Recurrent Units: On Learning to Forget Li Jing, a - - PowerPoint PPT Presentation

▶

Oct 26, 2022 226 likes •323 views

Gated Orthogonal Recurrent Units: On Learning to Forget Li Jing, a lar Glehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Solja i , Yoshua Bengio Gradient Vanishing/Explosion Problem During backpropagation through time,

SLIDE 1

Gated Orthogonal Recurrent Units: On Learning to Forget

Li Jing, Çağlar Gülçehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljačić, Yoshua Bengio

SLIDE 2

Gradient Vanishing/Explosion Problem

During backpropagation through time,

hidden to hidden Jacobian matrix is multiplied multiple times.

Gradient vanishing/explosion makes

RNN hard to train

Li Jing

SLIDE 3

Conventional Solution: LSTM

Practically, gradient clipping is required
slow to learn long term dependency

Li Jing

SLIDE 4

Unitary/Orthogonal RNN

Unitary/Orthogonal matrices keep the norm of vectors:

By enforcing hidden to hidden transition matrix to be unitary/

rthogonal, no matter how many time steps are propagated,

the norm of the gradient will stay the same

Li Jing

Restricted-capacity Unitary Matrix Parametrization (Arjovsky, ICML 2016)
Full-capacity Unitary Matrix by projection (Wisdom, NIPS 2016)
Tunable Efficient Unitary Neural Networks (EUNN) and their application to RNN (Jing, ICML 2017)
Orthogonal Matrix Parametrization by reflection (Mhammedi, ICML 2017)
Orthogonal Matrix by regularization (Vorontsov, ICML 2017)

SLIDE 5

Limitation for basic Orthogonal RNN

No forgetting mechanism
Limited Memory size

SLIDE 6

Applying Gated System to Orthogonal RNN

z 1-z

IN OUT

modReLU U Wx

Gated Orthogonal Recurrent Unit Long Term Dependency Forgetting Unitary/Orthogonal Matrices Gated Mechanism

SLIDE 7

Experiment results

Copying Task
Parenthesis Task

Li Jing

Denoise Task

Synthetic Tasks: GORU is the only one succeeding in all tasks

SLIDE 8

Experiment results

Question Answering Task

Li Jing

Speech Task

Real Tasks: GORU outperforms all other models

SLIDE 9

Gradient Vanishing/Explosion Problem

Conventional Solution: LSTM

Unitary/Orthogonal RNN

Limitation for basic Orthogonal RNN

Applying Gated System to Orthogonal RNN

Experiment results

Experiment results

Thank you