Outline Gated Feedback Recurrent Neural Networks. arXiv1502. - PowerPoint PPT Presentation

Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program Evaluation ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505. Introduction ReNet: 4 RNNs that sweep over lower-layer features in 4 directions Experiments: MNIST & CIFAR-10 & Street View House Numbers LU Yangyang luyy11@pku.edu.cn May 2015 @ KERE Seminar

Outline GF-RNN ReNet Authors • Gated Feedback Recurrent Neural Networks • arXiv.org. 9 Feb 2015 - 18 Feb 2015. • Junyoung Chung, Caglar Gulcehre, Kyunghyun Cho , Yoshua Ben- gio (University of Montreal) • ReNet: A Recurrent Neural Network Based Alternative to Convolu- tional Networks • arXiv.org. 3 May 2015. • Francesco Visin, Kyle Kastner, Kyunghyun Cho , Matteo Matteucci, Aaron Courville, Yoshua Bengio (University of Montreal)

Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program Evaluation ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505.

Outline GF-RNN ReNet Recurrent Neural Networks (RNN) FOR Sequence Modeling • Can process a sequence of arbitrary length • Recursively applying a transition function to its internal hidden state for each symbol of the input sequence • Theoretically capture any long-term dependency in an input sequence • Difficult to train an RNN to actually do so h t = f ( x t , h t − 1 ) = φ ( Wx t + Uh t − 1 ) p ( x 1 , x 2 , ..., x T ) = p ( x 1 ) p ( x 2 | x 1 ) ...p ( x T | x 1 , ..., x T − 1 ) p ( x t +1 | x 1 , ..., x t ) = g ( h t ) Figure: A single-layer RNN

Outline GF-RNN ReNet Gated Recurrent Neural Networks 1 LSTM & GRU A LSTM Unit: hj t = oj t tanh( cj A GRU Unit: t ) cj t = fj t cj t − 1 + ij cj hj t = (1 − zj t ) hj t − 1 + zj hj t ˜ t ˜ t t cj t = tanh( Wc x t + Uc h t − 1) j zj t = σ ( Wz x t + Uz h t − 1) j ˜ fj t = σ ( Wf x t + Uf h t − 1 + Vf c t ) j hj t = tanh( W x t + U ( r t ⊙ h t − 1)) j ˜ ij t = σ ( Wi x t + Ui h t − 1 + Vi c t ) j rj t = σ ( Wr x t + Ur h t − 1) j oj t = σ ( Wo x t + Uo h t − 1 + Vo c t ) j 1Chung, J.,et al. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv’14.

Outline GF-RNN ReNet Gated Recurrent Neural Networks Modifying the RNN architecture • Using a gated activation function: - the long short-term memory unit (LSTM): a memory cell, an input gate , a forget gate, and an output gate - the gated recurrent unit (GRU): a reset gate and an update gate • Can contain both fast changing and slow changing components - stacked multiple levels of recurrent layers - partitioned and grouped hidden units to allow feedback information at multiple timescales • Achieved promising results in both classification and generation tasks ⇒ Gated-feedback RNN (GF-RNN): learning multiple adaptive timescales

Outline GF-RNN ReNet GF-RNN: Overview Figure: A Clockwork RNN • A sequence often consists of both slow-moving and fast-moving components. - slow-moving: long-term dependencies - fast-moving: short-term dependencies • El Hihi & Bengio (1995): an RNN can capture these dependencies of different timescales more easily and efficiently when the hidden units of the RNN is explicitly partitioned into groups that correspond to different timescales. • The clockwork RNN (CW-RNN) (Koutnik et al., 2014): updating the i -th module only when t mod 2 i − 1 = 0 ⇒ to generalize the CW-RNN by allowing the model to adaptively adjust the connectivity pattern between the hidden layers in the consecutive time-steps

Outline GF-RNN ReNet GF-RNN: Overview (cont.) • Partition the hidden units into multiple modules: each module corresponds to a different layer in a stack of recurrent layers • Compared to CW-RNN: do not set an explicit rate for each module each module: hierarchically stacked → different timescales • Each module is fully connected to all the other modules across the stack and itself. • The global reset gate : gated the recurrent connection between two modules based on the current input and the previous states of the hidden layers

Outline GF-RNN ReNet GF-RNN: The global reset gate • h i t : the hidden unit on the i -th layer at time-step t • w i → j , u i → j : weights for inputs and hidden states of all layers at time-step t − 1 g g t − 1 to the j -th layer h j • g i → j : control the signal from the i -th layer h i t based on the input and the previous hidden units

Outline GF-RNN ReNet GF-RNN: The global reset gate • h i t : the hidden unit on the i -th layer at time-step t • w i → j , u i → j : weights for inputs and hidden states of all layers at time-step t − 1 g g t − 1 to the j -th layer h j • g i → j : control the signal from the i -th layer h i t based on the input and the previous hidden units Information flows: • stacked RNN & GF-RNN: lower layers → upper layers • GF-RNN: lower layers ← upper layers (finer timescale ← coarser timescale) A gated-feedback RNN: A fully-connected recurrent transition and global reset gates

Outline GF-RNN ReNet GF-RNN: Different Units of Practical Implementation • tanh Units • LSTM Units & GRU Units: only use the global reset gates when computing the new state LSTM: GRU: hj t = oj t tanh( cj t ) cj t = fj t cj t − 1 + ij cj hj t = (1 − zj t ) hj t − 1 + zj hj t ˜ t ˜ t t zj t = σ ( Wz x t + Uz h t − 1) j fj t = σ ( Wf x t + Uf h t − 1 + Vf c t ) j rj t = σ ( Wr x t + Ur h t − 1) j ij t = σ ( Wi x t + Ui h t − 1 + Vi c t ) j oj t = σ ( Wo x t + Uo h t − 1 + Vo c t ) j

Outline GF-RNN ReNet Experiment Tasks BOTH: representative examples of discrete sequence modeling Objective Function: to minimize the negative log-likelihood of training sequences • Character-level language modeling: • English Wikipedia: 100MB characters • Contents: Latin alphabets, non-Latin alphabets, XML markups and special characters • Vocabulary: 205 characters (one token for unknown character) • Train/CV/Test: 90MB/5MB/5MB • Preformance measure: the average number of bits-per-character (BPC, E [ − log 2 P ( x t +1 | h t )] ) • Pythong Program Evaluation: • Goal:to generate or predict a correct return value of a given Python script • Input: python scripts (include addition, multiplication, subtraction, for-loop, variable assignment, logical comparison and if-else statement) • Output: predicted value of the given Python script • Input/Output: 41/31 symbols

Outline GF-RNN ReNet Examples for Python Program Evaluation 2 2Zaremba, Wojciech and Sutskever, Ilya. Learning to execute. arXiv preprint arXiv:1410.4615, 2014.

Outline GF-RNN ReNet Experiments: Character-level Language Modeling • The sizes of models: • Tuning parameters: RMSProp and momentum • Test set BPC of models trained on the Hutter dataset for a 100 epochs:

Outline GF-RNN ReNet Experiments: Character-level Language Modeling (cont.) Text Generation based on character-level language modeling: • Given the seed at the left-most column (bold-faced font), the models predict next 200 - 300 characters. • Tabs, spaces and new-line characters are also generated by the models.

Outline GF-RNN ReNet Experiments: Python Program Evaluation Using an RNN encoder-decoder approach: • Python scripts → ENCODER (50 timesteps) → h t → DECODER → character-level results

Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks. arXiv1505. Introduction ReNet: 4 RNNs that sweep over lower-layer features in 4 directions Experiments: MNIST & CIFAR-10 & Street View House Numbers

Outline GF-RNN ReNet Introduction Object Recognition: • Convolutional Neural Networks (CNN): LeNet-5 - based on local context window • Recurrent Neural Networks : • (Graves and Schmidhuber, 2009): a multi-dimensional RNN • ReNet : purely uni-dimensional RNNs: replace each convolutional layer (conv. + pooling) in the CNN ⇒ 4 RNNs that sweep over lower-layer features in 4 directions: ↑ , ↓ , ← , → - each feature activation: at the specific location with respect to the whole image

Outline GF-RNN ReNet A one-layer ReNet • The Input Image: x ∈ R w × h × c (width, height, feature dimensionality) • Give a patch size - w p × h p : split the input image x into a set of I × J (non- overlapping) patches X = { x ij } , x ij ∈ R w p × h p × c 1. Sweep the image vertically with 2 RNNs( ↑ , ↓ ): Each RNN takes as an input one (flattened) patch at a time and updates its hidden state, working along each column j of the split input image X . 2. Concatenate the intermediate hidden states z F i,j , z R i,j at each location ( i, j ) to get a composite feature map v = { z i,j } j =1 ,...,J i =1 ,...,I , z ij ∈ R 1 × h p × 2 d ( d : the number of recurrent units) 3. Sweep V horizonally with 2 RNNs( ← , → ) in a similar man- ij ∈ R 1 × 1 × 2 d : ner. The resulting feature map H = z ′ i,j , z ′ the features of the original image patch x i,j in the context of the whole image The deep ReNet: stacked multiple φ ’s ( φ : the function from X to H )

Outline Gated Feedback Recurrent Neural Networks. arXiv1502. - PowerPoint PPT Presentation

Outline GF-RNN ReNet Outline Gated Feedback Recurrent Neural Networks. arXiv1502. Introduction: RNN & Gated RNN Gated Feedback Recurrent Neural Networks (GF-RNN) Experiments: Character-level Language Modeling & Python Program

Ins Domingues Breast Cancer Workshop April 7th 2015 Outline Outline Outline Outline

Presentation Preparation Outline Speech Outline Template ***Use this outline to guide you in

Outline for St Outline for St Outline for

Beob Kyun Kim, S oonwook Hwang {kyun, hwang}@ kisti.re.kr KIS TI, Korea Outline Outline

Catherine Revels, World Bank November 2009 Presentation outline Presentation outline

Battlestar Galactica Battlestar Galactica Galactica Battlestar Outline Outline Outline

Outline 2 Outline 2 ZSim core simulation techniques Outline 2 ZSim core simulation

Appendix J: Capstone Presentation Outline Revised Spring 2016 CAPSTONE PRESENTATION OUTLINE This

PT1 TMP Presentation Outline 1 Group Members: ___________________________________ Use this outline

Broverview Outline 2 Outline Philosophy and Architecture A framework for network traffic

Xingqian Peng, Huaqiao University, China Presented by Zhen Wu Presented by Zhen Wu October 30,2011

1 Web Application Development 2 3 Web Application Development CSS Outline An outline is a

Lecture Outline Strengthening Induction Hypothesis. Lecture Outline Strengthening Induction

STAT 213 Simple Linear Regression I Colin Reimer Dawson Oberlin College 5 October 2016 Outline

High Dimensional Approximation - Outline Background and Sources Wolfgang Dahmen Seminar: USC,

Outline Outline Deaf and Hearing Impaired Deaf and Hearing Impaired Physical Structures of

Logics of variable inclusion and Ponka sums of matrices Tommaso Moraschini joint work with S.

Lecture 6: Outline Recap the stationary distribution, and the vector ( k ) . The vector ( k

Lecture 4: Outline The period of a state The period of a state Random walks Random walks

Superscalar Organization Nima Honarmand Spring 2018 :: CSE 502 Review: Instruction-Level

Identification by Laplace Transform in Nonlinear Panel or Time Series Models with Unobserved

Autofocusing with the help of the empirical Haar transform Przemysaw Sliwi nski and

Weak Galerkin Finite Element Methods for Elliptic and Parabolic Problems on Polygonal Meshes

rt t sst