Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour tb@a2ia.com , jl@a2ia.com 1/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 2/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 3/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Offline Handwritten Text Recognition Dear Charlize. You are cordially invited to the grand opening of my new art gallery intitled «The new era of Media Music and paintings». on July 17th 2012 P.S: UR presence is obligatory due to your great help of launching my career. Line segmentation in the front-end “Temporal Classification”: Variable-length 1D or 2D input �→ 1D target sequence (different length) 4/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Modeling: Recurrent Neural Networks (RNN) State-of-the-art in Handwritten Text Recognition Task: Image (2D sequence) �→ 1D sequence of characters CTC _ _ a _ p i d _ p 1.0 I t w a s s l e n d i n t e r r e t a t i o n N-way softmax ..... 0.8 0.6 0.4 2 N 0.2 2 20 N 6 0.0 0 20 40 60 80 100 120 140 160 50 N 20 10 6 2 Input image MDLSTM Convolutional Sum & Tanh MDLSTM Convolutional Sum & Tanh MDLSTM Fully-connected Sum Collapse Block: 2 x 2 Features: 2 Input size: 2x4 Features: 10 Input: 2 x 4 Features: 50 Features: N Features: 6 Features: 20 RNN Network Architecture (Graves & Schmidhuber, 2008) 1 Multi-Directional layers of LSTM unit “Long-Short Term Memory” – 2D recurrence in 4 possible directions Convolutions: parameterized subsampling layers Collapse layer: from 2D to 1D (output ∼ log P ) 5/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Modeling: Recurrent Neural Networks (RNN) State-of-the-art in Handwritten Text Recognition Task: Image (2D sequence) �→ 1D sequence of characters CTC _ _ a _ p i d _ p 1.0 I t w a s s e l n d i n t r e e r t a t i o n N-way softmax ..... 0.8 0.6 0.4 2 N 0.2 2 20 N 6 0.0 0 20 40 60 80 100 120 140 160 50 N 20 10 6 2 Input image MDLSTM Convolutional Sum & Tanh MDLSTM Convolutional Sum & Tanh MDLSTM Fully-connected Sum Collapse Block: 2 x 2 Features: 2 Input size: 2x4 Features: 10 Input: 2 x 4 Features: 50 Features: N Features: 6 Features: 20 RNN Network Architecture (Graves & Schmidhuber, 2008) 1 Multi-Directional layers of LSTM unit “Long-Short Term Memory” – 2D recurrence in 4 possible directions Convolutions: parameterized subsampling layers Collapse layer: from 2D to 1D (output ∼ log P ) CTC Training (“Connectionist Temporal Classification”) 2 The network can output all possible symbols and also a blank output Minimization of the Negative Log-Likelihood − log( P ( Y | X )) (NLL) 5/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Modeling: Recurrent Neural Networks (RNN) State-of-the-art in Handwritten Text Recognition The recurrent neurons are Long Short-Term Memory (LSTM) units Forget gate Output gate hidden layer hidden layer (i, j-1) (i-1, j) (i, j) (i, j) (i-1, j) (i, j+1) input layer input layer (i, j) (i, j) hidden layer hidden layer Input gate (i, j-1) (i, j) (i, j) Forget gate (i, j+1) (i+1, j) (i+1, j) input layer input layer (i, j) (i, j) 6/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Loss function: Connectionist Temporal Classification (CTC) Deal with several possible alignments between two 1D sequences Target Sequence ''Tea'' ∅ length U' = 2 U +1 A ∅ m E � − log P ( Y | X ) c 5 8 , 1 ∅ T ∅ 34,2cm ≥ U length T U = 3: Number of target symbols T : Number of RNN outputs ∝ image width Basic decoding strategy (without lexicon neither language model): �→ [ ∅ . . . ]T . . . [ ∅ . . . ]E . . . [ ∅ . . . ]A . . . [ ∅ . . . ] “ TEA ” 7/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Loss function: Connectionist Temporal Classification (CTC) Deal with several possible alignments between two 1D sequences Target Sequence ''Tee'' ∅ length U' = 2 U +1 E ∅ m E � − log P ( Y | X ) c 5 8 , 1 ∅ T ∅ 34,2cm ≥ U +1 length T U = 3: Number of target symbols T : Number of RNN outputs ∝ image width Basic decoding strategy (without lexicon neither language model): �→ [ ∅ . . . ]T . . . [ ∅ . . . ]E . . . ∅ . . . E . . . [ ∅ . . . ] “ TEE ” 7/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

RNN for Handwritten Text Line Recognition Optimization: Stochastic Gradient Descent Simple and efficient No mathematical guarantee (no chance to converge to the real global minimum) But popular with deep networks: works well in practice! (find “good” local minima) for ( input, target ) in Oracle() do output= RNN.Forward( input ) outGrad= CTC NLL.Gradient( output, target ) paramGrad= RNN.BackwardGradient( input, ..., outGrad ) RNN.Update( paramGrad ) end for 8/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 9/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Dropout General Principle [Krizhevsky & Hinton, 2012] Training: Randomly set to 0 intermediate activities (*) with probability p (typically p = 0 . 5) (*) neurons outputs usually in [ − 1 , 1], [0 , 1] or [0 , ∞ ) ∼ Sampling from 2 N different architectures that share weights Decoding: All intermediate activities are scaled, by 1 − p ∼ Geometric mean of the outputs from 2 N models Featured in award-winning convolutional networks (ImageNet) 10/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Dropout Dropout with recurrent layer Recurrent connections are kept untouched Dropout can be implemented as separated layer (outputs identical to inputs, except at dropped locations) 11/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Dropout for RNN Dropout Overview of the full network Dropout Dropout Dropout CTC N-way softmax ..... 2 N 20 2 N 6 50 N 20 10 6 2 MDLSTM Fully-connected Sum Collapse Input image MDLSTM Convolutional Sum & Tanh MDLSTM Convolutional Sum & Tanh Block: 2 x 2 Features: 2 Input: 2x4 Features: 10 Input: 2x4 Features: 50 Features: N Stride: 2x4 Stride: 2x4 Features: 6 Features: 20 After recurrent LSTM layers Before feed-forward layers (convolutional and linear layers) 12/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Experiments Outline RNN for Handwritten Text Line Recognition 1 Offline Handwritten Text Recognition Recurrent Neural Networks (RNN) Dropout for RNN 2 Experiments 3 Improvement of RNN Improvement of the complete recognition system 13/22 Dropout improves Recurrent Neural Networks for Handwriting Recognition Vu Pham Th´ eodore Bluche Christopher Kermorvant J´ erˆ ome Louradour

Download Presentation

Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend

More recommend