outline
play

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. - PowerPoint PPT Presentation

Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program


  1. Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program Evaluation ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505. Introduction ReNet: 4 RNNs that sweep over lower-layer features in 4 directions Experiments: MNIST & CIFAR-10 & Street View House Numbers LU Yangyang luyy11@pku.edu.cn May 2015 @ KERE Seminar

  2. Outline GF-RNN ReNet Authors • Gated Feedback Recurrent Neural Networks • arXiv.org. 9 Feb 2015 - 18 Feb 2015. • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho , Yoshua Ben- gio (University of Montreal) • ReNet: A Recurrent Neural Network Based Alternative to Convolu- tional Networks • arXiv.org. 3 May 2015. • Francesco Visin, Kyle Kastner, Kyunghyun Cho , Matteo Matteucci, Aaron Courville, Yoshua Bengio (University of Montreal)

  3. Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program Evaluation ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505.

  4. Outline GF-RNN ReNet Recurrent Neural Networks (RNN) FOR Sequence Modeling • Can process a sequence of arbitrary length • Recursively applying a transition function to its internal hidden state for each symbol of the input sequence • Theoretically capture any long-term dependency in an input sequence • Difficult to train an RNN to actually do so h t = f ( x t , h t − 1 ) = φ ( Wx t + Uh t − 1 ) p ( x 1 , x 2 , ..., x T ) = p ( x 1 ) p ( x 2 | x 1 ) ...p ( x T | x 1 , ..., x T − 1 ) p ( x t +1 | x 1 , ..., x t ) = g ( h t ) Figure: A single-layer RNN

  5. Outline GF-RNN ReNet Gated Recurrent Neural Networks 1 LSTM & GRU A LSTM Unit: hj t = oj t tanh( cj A GRU Unit: t ) cj t = fj t cj t − 1 + ij cj hj t = (1 − zj t ) hj t − 1 + zj hj t ˜ t ˜ t t cj t = tanh( Wc x t + Uc h t − 1) j zj t = σ ( Wz x t + Uz h t − 1) j ˜ fj t = σ ( Wf x t + Uf h t − 1 + Vf c t ) j hj t = tanh( W x t + U ( r t ⊙ h t − 1)) j ˜ ij t = σ ( Wi x t + Ui h t − 1 + Vi c t ) j rj t = σ ( Wr x t + Ur h t − 1) j oj t = σ ( Wo x t + Uo h t − 1 + Vo c t ) j 1Chung, J.,et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv’14.

  6. Outline GF-RNN ReNet Gated Recurrent Neural Networks Modifying the RNN architecture • Using a gated activation function: - the long short-term memory unit (LSTM): a memory cell, an input gate , a forget gate, and an output gate - the gated recurrent unit (GRU): a reset gate and an update gate • Can contain both fast changing and slow changing components - stacked multiple levels of recurrent layers - partitioned and grouped hidden units to allow feedback information at multiple timescales • Achieved promising results in both classification and generation tasks ⇒ Gated-feedback RNN (GF-RNN): learning multiple adaptive timescales

  7. Outline GF-RNN ReNet GF-RNN: Overview Figure: A Clockwork RNN • A sequence often consists of both slow-moving and fast-moving components. - slow-moving: long-term dependencies - fast-moving: short-term dependencies • El Hihi & Bengio (1995): an RNN can capture these dependencies of different timescales more easily and efficiently when the hidden units of the RNN is explicitly partitioned into groups that correspond to different timescales. • The clockwork RNN (CW-RNN) (Koutnik et al., 2014): updating the i -th module only when t mod 2 i − 1 = 0 ⇒ to generalize the CW-RNN by allowing the model to adaptively adjust the connectivity pattern between the hidden layers in the consecutive time-steps

  8. Outline GF-RNN ReNet GF-RNN: Overview (cont.) • Partition the hidden units into multiple modules: each module corresponds to a different layer in a stack of recurrent layers • Compared to CW-RNN: do not set an explicit rate for each module each module: hierarchically stacked → different timescales • Each module is fully connected to all the other modules across the stack and itself. • The global reset gate : gated the recurrent connection between two modules based on the current input and the previous states of the hidden layers

  9. Outline GF-RNN ReNet GF-RNN: The global reset gate • h i t : the hidden unit on the i -th layer at time-step t • w i → j , u i → j : weights for inputs and hidden states of all layers at time-step t − 1 g g t − 1 to the j -th layer h j • g i → j : control the signal from the i -th layer h i t based on the input and the previous hidden units

  10. Outline GF-RNN ReNet GF-RNN: The global reset gate • h i t : the hidden unit on the i -th layer at time-step t • w i → j , u i → j : weights for inputs and hidden states of all layers at time-step t − 1 g g t − 1 to the j -th layer h j • g i → j : control the signal from the i -th layer h i t based on the input and the previous hidden units Information flows: • stacked RNN & GF-RNN: lower layers → upper layers • GF-RNN: lower layers ← upper layers (finer timescale ← coarser timescale) A gated-feedback RNN: A fully-connected recurrent transition and global reset gates

  11. Outline GF-RNN ReNet GF-RNN: Different Units of Practical Implementation • tanh Units • LSTM Units & GRU Units: only use the global reset gates when computing the new state LSTM: GRU: hj t = oj t tanh( cj t ) cj t = fj t cj t − 1 + ij cj hj t = (1 − zj t ) hj t − 1 + zj hj t ˜ t ˜ t t zj t = σ ( Wz x t + Uz h t − 1) j fj t = σ ( Wf x t + Uf h t − 1 + Vf c t ) j rj t = σ ( Wr x t + Ur h t − 1) j ij t = σ ( Wi x t + Ui h t − 1 + Vi c t ) j oj t = σ ( Wo x t + Uo h t − 1 + Vo c t ) j

  12. Outline GF-RNN ReNet Experiment Tasks BOTH: representative examples of discrete sequence modeling Objective Function: to minimize the negative log-likelihood of training sequences • Character-level language modeling: • English Wikipedia: 100MB characters • Contents: Latin alphabets, non-Latin alphabets, XML markups and special characters • Vocabulary: 205 characters (one token for unknown character) • Train/CV/Test: 90MB/5MB/5MB • Preformance measure: the average number of bits-per-character (BPC, E [ − log 2 P ( x t +1 | h t )] ) • Pythong Program Evaluation: • Goal:to generate or predict a correct return value of a given Python script • Input: python scripts (include addition, multiplication, subtraction, for-loop, variable assignment, logical comparison and if-else statement) • Output: predicted value of the given Python script • Input/Output: 41/31 symbols

  13. Outline GF-RNN ReNet Examples for Python Program Evaluation 2 2Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615, 2014.

  14. Outline GF-RNN ReNet Experiments: Character-level Language Modeling • The sizes of models: • Tuning parameters: RMSProp and momentum • Test set BPC of models trained on the Hutter dataset for a 100 epochs:

  15. Outline GF-RNN ReNet Experiments: Character-level Language Modeling (cont.) Text Generation based on character-level language modeling: • Given the seed at the left-most column (bold-faced font), the models predict next 200 - 300 characters. • Tabs, spaces and new-line characters are also generated by the models.

  16. Outline GF-RNN ReNet Experiments: Python Program Evaluation Using an RNN encoder-decoder approach: • Python scripts → ENCODER (50 timesteps) → h t → DECODER → character-level results

  17. Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505. Introduction ReNet: 4 RNNs that sweep over lower-layer features in 4 directions Experiments: MNIST & CIFAR-10 & Street View House Numbers

  18. Outline GF-RNN ReNet Introduction Object Recognition: • Convolutional Neural Networks (CNN): LeNet-5 - based on local context window • Recurrent Neural Networks : • (Graves and Schmidhuber, 2009): a multi-dimensional RNN • ReNet : purely uni-dimensional RNNs: replace each convolutional layer (conv. + pooling) in the CNN ⇒ 4 RNNs that sweep over lower-layer features in 4 directions: ↑ , ↓ , ← , → - each feature activation: at the specific location with respect to the whole image

  19. Outline GF-RNN ReNet A one-layer ReNet • The Input Image: x ∈ R w × h × c (width, height, feature dimensionality) • Give a patch size - w p × h p : split the input image x into a set of I × J (non- overlapping) patches X = { x ij } , x ij ∈ R w p × h p × c 1. Sweep the image vertically with 2 RNNs( ↑ , ↓ ): Each RNN takes as an input one (flattened) patch at a time and updates its hidden state, working along each column j of the split input image X . 2. Concatenate the intermediate hidden states z F i,j , z R i,j at each location ( i, j ) to get a composite feature map v = { z i,j } j =1 ,...,J i =1 ,...,I , z ij ∈ R 1 × h p × 2 d ( d : the number of recurrent units) 3. Sweep V horizonally with 2 RNNs( ← , → ) in a similar man- ij ∈ R 1 × 1 × 2 d : ner. The resulting feature map H = z ′ i,j , z ′ the features of the original image patch x i,j in the context of the whole image The deep ReNet: stacked multiple φ ’s ( φ : the function from X to H )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend