Recurrent Neural Networks with Flexible Gates using Kernel - PowerPoint PPT Presentation

2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP’18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors : S. Scardapane, S. Van Vaerenbergh, D. Comminiello, S. Totaro and A. Uncini

Contents Introduction Overview Gated recurrent networks Formulation Proposed gate with flexible sigmoid Kernel activation function KAF generalization for gates Experimental validation Experimental setup Results Conclusion and future works Summary and future outline

Content at a glance Setting : Gated units have become an integral part of deep learning (e.g., LSTMs, highway networks, ...). State-of-the-art : Small number of studies on how to design more flexible gate architectures (e.g., Gao and Glowacka, ACML 2016). Objective : Design an enhanced gate, with a small number of addi- tional adaptable parameters, to model a wider range of gating functions.

Gated unit: basic model Definition: (vanilla) gated unit For a generic input x we have: g ( x ) = σ ( Wx ) ⊙ f ( x ) , (1) where σ ( · ) is the sigmoid function, ⊙ is the element-wise multiplication, and f ( x ) a generic network component. Notable examples: ◮ LSTM networks (Hochreiter and Schmidhuber, 1997). ◮ Gated recurrent units (Cho et al., 2014). ◮ Highway networks (Srivastava et al., 2015). ◮ Neural arithmetic logic unit (Trask et al., 2018).

Gated recurrent unit (GRU) At each time step t we receive x t ∈ R d and update the in- ternal state h t − 1 as: u t = σ ( W u x t + V u h t − 1 + b u ) , (2) r t = σ ( W r x t + V r h t − 1 + b r ) , (3) h t = ( 1 − u t ) ◦ h t − 1 + � � u t ◦ tanh W h x t + U t ( r t ◦ h t − 1 ) + b h (4) , where (2)-(3) are the update gate and reset gate . Cho, K. et al., 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation . EMNLP 2014 .

Training the network (classification) � � N i = 1 with labels y i = 1 , . . . , C . h i is the in- x i N sequences t ternal state of the GRU after processing the i -th sequence. This is fed through another layer with a softmax activation function for classification: � � y i = softmax Ah i + b � (5) , We then minimize the average cross-entropy between the real classes and the predicted classes: � � N C � � � � J ( θ ) = − 1 y i = c y i � log (6) , j N i = 1 c = 1

Summary of the proposal Key items of our proposal : 1. Maintain the linear component, but replace the sigmoid element-wise operation with a generalized sigmoid function. 2. We extend the kernel activation function (KAF), a re- cently proposed non-parametric activation function. 3. We modify the KAF to ensure that it behaves correctly as a gating function.

Basic structure of the KAF A KAF models each activation function in terms of a kernel expansion over D terms as: D � KAF ( s ) = α i κ ( s , d i ) , (7) i = 1 where: 1. { α i } D i = 1 are the mixing coefficients ; 2. { d i } D i = 1 are the dictionary elements ; 3. κ ( · , · ) : R × R → R is a 1D kernel function . Scardapane, S., Van Vaerenbergh, S., Totaro, S. and Uncini, A., 2017. Kafnets: kernel-based non-parametric activation functions for neural networks . arXiv preprint arXiv:1707.04035 .

Extending KAFs for gated units We cannot use a KAF straightforwardly because it is un- bounded and potentially vanishing to zero (e.g. with the Gaussian kernel). We use the following modified formulation for the flexible gate: � 1 � 2KAF ( s ) + 1 σ KAF ( s ) = σ 2 s (8) . As in the original KAF, dictionary elements are fixed (by uniform sampling around 0), while we adapt everything else.

Visualizing the new gates Value of the gate Value of the gate Value of the gate −5 0 5 −5 0 5 −5 0 5 Activation Activation Activation (a) γ = 1.0 (b) γ = 0.5 (c) γ = 0.1 Figure 1: Random samples of the proposed flexible gates with Gaussian kernel and different hyperparameters.

Initializing the mixing coefficients To simplify optimization we initialize the mixing coefficients to approximate the identity function: α = ( K + ε I ) − 1 d , (9) where ε > 0 is a small constant. We then use a different set of mixing coefficients for each forget gate and update gate. 1.0 0.5 Gate output 0.0 −0.5 −1.0 −2 0 2 Activation

Recurrent Neural Networks with Flexible Gates using Kernel - PowerPoint PPT Presentation

2018 IEEE International Workshop on Machine Learning for Signal Processing (MLSP18) Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions Authors : S. Scardapane, S. Van Vaerenbergh, D. Comminiello, S. Totaro and A.

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Logic Gates and Typical gates Functional Blocks Logic gates ideally have signals of two

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Learning Neural Networks Learning Neural Networks Neural Networks can represent complex Neural

Recurrent Neural Networks Sharan Narang May 9, 2017 Silicon Valley AI Lab Speech Recognition

Q1 FY2018 RESULTS PRESENTATION 8 February 2018 Good start of the year Delivered strong

Lecture 10: Recurrent Neural Networks CS109B Data Science 2 Pavlos Protopapas and Mark Glickman

ICON Clinical Research SAS How to standardize solutions to recurrent issues PhUSE Conference

THE ILLINOIS FREEDOM OF INFORMATION ACT 1 Disclaimer The Illinois and Federal Freedom of

ENGIE Energa Per Results as of December 2016 2016 HIGHLIGHTS Total Installed Capacity grew

Corporate Presentation 1st Quarter 2016 Highlights Increase of 80% in volume when compared to

General Assembly Study Committee SCP Recurrent Flooding Sub-Panel Co-Chair Jim Redick Background

Sambuz

Useful Links

Newsletter

Mail Us