Dropout improves Recurrent Neural Networks for Handwriting - PowerPoint PPT Presentation

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour tb@a2ia.com , jl@a2ia.com 1/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 2/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 3/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Offline Handwritten Text Recognition Dear Charlize. You are cordially invited to the grand opening of my new art gallery intitled «The new era of Media Music and paintings». on July 17th 2012 P.S: UR presence is obligatory due to your great help of launching my career. Line segmentation in the front-end “Temporal Classification”: Variable-length 1D or 2D input �→ 1D target sequence (different length) 4/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Modeling: Recurrent Neural Networks (RNN) State-of-the-art in Handwritten Text Recognition Task: Image (2D sequence) �→ 1D sequence of characters CTC _ _ a _ p i d _ p 1.0 I t w a s s l e n d i n t e r r e t a t i o n N-way softmax ..... 0.8 0.6 0.4 2 N 0.2 2 20 N 6 0.0 0 20 40 60 80 100 120 140 160 50 N 20 10 6 2 Input image MDLSTM Convolutional Sum & Tanh MDLSTM Convolutional Sum & Tanh MDLSTM Fully-connected Sum Collapse Block: 2 x 2 Features: 2 Input size: 2x4 Features: 10 Input: 2 x 4 Features: 50 Features: N Features: 6 Features: 20 RNN Network Architecture (Graves & Schmidhuber, 2008) 1 Multi-Directional layers of LSTM unit “Long-Short Term Memory” – 2D recurrence in 4 possible directions Convolutions: parameterized subsampling layers Collapse layer: from 2D to 1D (output ∼ log P ) 5/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Modeling: Recurrent Neural Networks (RNN) State-of-the-art in Handwritten Text Recognition Task: Image (2D sequence) �→ 1D sequence of characters CTC _ _ a _ p i d _ p 1.0 I t w a s s e l n d i n t r e e r t a t i o n N-way softmax ..... 0.8 0.6 0.4 2 N 0.2 2 20 N 6 0.0 0 20 40 60 80 100 120 140 160 50 N 20 10 6 2 Input image MDLSTM Convolutional Sum & Tanh MDLSTM Convolutional Sum & Tanh MDLSTM Fully-connected Sum Collapse Block: 2 x 2 Features: 2 Input size: 2x4 Features: 10 Input: 2 x 4 Features: 50 Features: N Features: 6 Features: 20 RNN Network Architecture (Graves & Schmidhuber, 2008) 1 Multi-Directional layers of LSTM unit “Long-Short Term Memory” – 2D recurrence in 4 possible directions Convolutions: parameterized subsampling layers Collapse layer: from 2D to 1D (output ∼ log P ) CTC Training (“Connectionist Temporal Classification”) 2 The network can output all possible symbols and also a blank output Minimization of the Negative Log-Likelihood − log( P ( Y | X )) (NLL) 5/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Modeling: Recurrent Neural Networks (RNN) State-of-the-art in Handwritten Text Recognition The recurrent neurons are Long Short-Term Memory (LSTM) units Forget gate Output gate hidden layer hidden layer (i, j-1) (i-1, j) (i, j) (i, j) (i-1, j) (i, j+1) input layer input layer (i, j) (i, j) hidden layer hidden layer Input gate (i, j-1) (i, j) (i, j) Forget gate (i, j+1) (i+1, j) (i+1, j) input layer input layer (i, j) (i, j) 6/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Loss function: Connectionist Temporal Classification (CTC) Deal with several possible alignments between two 1D sequences Target Sequence ''Tea'' ∅ length U' = 2 U +1 A ∅ m E � − log P ( Y | X ) c 5 8 , 1 ∅ T ∅ 34,2cm ≥ U length T U = 3: Number of target symbols T : Number of RNN outputs ∝ image width Basic decoding strategy (without lexicon neither language model): �→ [ ∅ . . . ]T . . . [ ∅ . . . ]E . . . [ ∅ . . . ]A . . . [ ∅ . . . ] “ TEA ” 7/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Loss function: Connectionist Temporal Classification (CTC) Deal with several possible alignments between two 1D sequences Target Sequence ''Tee'' ∅ length U' = 2 U +1 E ∅ m E � − log P ( Y | X ) c 5 8 , 1 ∅ T ∅ 34,2cm ≥ U +1 length T U = 3: Number of target symbols T : Number of RNN outputs ∝ image width Basic decoding strategy (without lexicon neither language model): �→ [ ∅ . . . ]T . . . [ ∅ . . . ]E . . . ∅ . . . E . . . [ ∅ . . . ] “ TEE ” 7/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Optimization: Stochastic Gradient Descent Simple and efficient No mathematical guarantee (no chance to converge to the real global minimum) But popular with deep networks: works well in practice! (find “good” local minima) for ( input, target ) in Oracle() do output= RNN.Forward( input ) outGrad= CTC NLL.Gradient( output, target ) paramGrad= RNN.BackwardGradient( input, ..., outGrad ) RNN.Update( paramGrad ) end for 8/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 9/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Dropout General Principle [Krizhevsky & Hinton, 2012] Training: Randomly set to 0 intermediate activities (*) with probability p (typically p = 0 . 5) (*) neurons outputs usually in [ − 1 , 1], [0 , 1] or [0 , ∞ ) ∼ Sampling from 2 N different architectures that share weights Decoding: All intermediate activities are scaled, by 1 − p ∼ Geometric mean of the outputs from 2 N models Featured in award-winning convolutional networks (ImageNet) 10/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Dropout Dropout with recurrent layer Recurrent connections are kept untouched Dropout can be implemented as separated layer (outputs identical to inputs, except at dropped locations) 11/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Dropout Overview of the full network Dropout Dropout Dropout CTC N-way softmax ..... 2 N 20 2 N 6 50 N 20 10 6 2 MDLSTM Fully-connected Sum Collapse Input image MDLSTM Convolutional Sum & Tanh MDLSTM Convolutional Sum & Tanh Block: 2 x 2 Features: 2 Input: 2x4 Features: 10 Input: 2x4 Features: 50 Features: N Stride: 2x4 Stride: 2x4 Features: 6 Features: 20 After recurrent LSTM layers Before feed-forward layers (convolutional and linear layers) 12/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Experiments Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 13/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout improves Recurrent Neural Networks for Handwriting - PowerPoint PPT Presentation

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore Bluche Christopher Kermorvant J er ome Louradour tb@a2ia.com , jl@a2ia.com 1/22 Dropout improves Recurrent Neural Networks for Handwriting

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang, Tianyi Zhou, Jeff

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Dropout in RNNs Following a VI Interpretation Yarin Gal yg279@cam.ac.uk Unless specified

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Evolving Artificial Neural Networks Tim Kovacs Evolving ANNs 1 of 23 Introduction Adapting

Quantum Neural Network (QNN) - Connecting Quantum and Brain with Optics - Yoshihisa Yamamoto NTT

Neural Networks as Function Primitives Software/Hardware Support with X-FILES/DANA Schuyler

Neural networks regularization through representation learning - presentation of my PhD work -

Additive Manufacturing Defect Detection using Neural Networks James Ferguson May 16, 2016

Neural Networks applied to Blending Challenges Sowmya Kamath, Patricia Burchat Blending

General Neural Networks Compositions of linear maps and component-wise non- linearities Neural

BrainChip Holdings Ltd. Corporate Presentation 1 BrainChip Overview BrainChip is a leading

Dropout improves Recurrent Neural Networks for Handwriting - PowerPoint PPT Presentation

Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th eodore Bluche Christopher Kermorvant J er ome Louradour tb@a2ia.com , jl@a2ia.com 1/22 Dropout improves Recurrent Neural Networks for Handwriting

CHAPTER II I CHAPTER I Recurrent Neural Networks Recurrent Neural Networks CHAPTER II : I :

Sequential Data with Neural Networks Recurrent Neural Networks Sequential input / output Greg

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

CS6501: Deep Learning for Visual Recognition Recurrent Neural Networks (RNNs) Todays Class

The Power of Linear Recurrent Neural Networks Neural Networks Was knnen lineare rekurrente

Recurrent Neural Network Xiaogang Wang xgwang@ee.cuhk.edu.hk February 26, 2019 cuhk Xiaogang

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang*, Tianyi Zhou*, Jeff

CHAPTER VII VII CHAPTER Learning in Recurrent Networks Learning in Recurrent Networks CHAPTER

Recurrent Neural Networks Greg Mori - CMPT 419/726 Goodfellow, Bengio, and Courville: Deep

Understanding LSTM Networks Recurrent Neural Networks An unrolled recurrent neural network The

CSEP 517: Natural Language Processing Recurrent Neural Networks Autumn 2018 Luke Zettlemoyer

Dropout in RNNs Following a VI Interpretation Yarin Gal yg279@cam.ac.uk Unless specified

Recurrent Neural Networks CS60010: Deep Learning Abir Das IIT Kharagpur Mar 11, 2020

Computa(on through dynamics Using recurrent neural networks to unveil mechanism in neural

IN5550 Neural Methods in Natural Language Processing Recurrent Neural Networks Stephan Oepen

NLP Programming Tutorial 8 - Recurrent Neural Nets Graham Neubig Nara Institute of Science and

Evolving Artificial Neural Networks Tim Kovacs Evolving ANNs 1 of 23 Introduction Adapting

Quantum Neural Network (QNN) - Connecting Quantum and Brain with Optics - Yoshihisa Yamamoto NTT

Neural Networks as Function Primitives Software/Hardware Support with X-FILES/DANA Schuyler

Neural networks regularization through representation learning - presentation of my PhD work -

Additive Manufacturing Defect Detection using Neural Networks James Ferguson May 16, 2016

Neural Networks applied to Blending Challenges Sowmya Kamath, Patricia Burchat Blending

General Neural Networks Compositions of linear maps and component-wise non- linearities Neural

BrainChip Holdings Ltd. Corporate Presentation 1 BrainChip Overview BrainChip is a leading

Jumpout : Improved Dropout for Deep Neural Networks with ReLUs Shengjie Wang, Tianyi Zhou, Jeff