fully convolutional networks for handwriting recognition
play

Fully Convolutional Networks for Handwriting Recognition Felipe - PDF document

Fully Convolutional Networks for Handwriting Recognition Felipe Petroski Such*, Dheeraj Peri*, Frank Brockler, Paul Hutkowski, Raymond Ptucha* *Rochester Institute of Technology, Kodak Alaris, 1 Such et al. ICFHR18 Background


  1. Fully Convolutional Networks for Handwriting Recognition Felipe Petroski Such*, Dheeraj Peri*, Frank Brockler†, Paul Hutkowski†, Raymond Ptucha* *Rochester Institute of Technology, †Kodak Alaris, 1 Such et al. ICFHR‘18 Background • Offline handwriting recognition continues to be a difficult process due to the virtually infinite ways the same information can be written. • Convolutional Neural Networks (CNNs) and have been applied to handwriting recognition with good success. • Recurrent Neural Networks (RNNs) are useful for arbitrary length I am truly touched by your kind contribution to sequences and Connectionist my birthday presents & grateful for your good Temporal Classification (CTC) are wishes. Winston Churchill good as a post correction step. Note: Some believe the above letter is a forgery. 3 Such et al. ICFHR‘18 1

  2. Workflow- Word Extraction Document Block Segmentation Segmentation SegNet or similar labels Modified XY Tree or similar each pixel by type- can grow suggests rectilinear splits. to orthogonal boundaries. Use both to define paragraphs, sentences and word blocks. 4 Such et al. ICFHR‘18 Workflow- Word Recognition • Preprocessing – Fix skewing, rotation, contrast • Prediction – CNNs, HMM, LSTMs used together • Post-processing – Train & Test: CTC f o r – Test: Language Model 5 Such et al. ICFHR‘18 2

  3. Proposed Method • Character classification without the need for: – Preprocessing- no deskewing – Predefined lexicon of words- can work on surnames, phone numbers, and street addresses – Post processing- No RNN or CTC needed • Utilizes Fully Convolutional Networks (FCNs) to translate arbitrary sequence length. – FCNs are faster to train than RNNs and more robust – CTC can still be used, but we found them hard to converge • Single architecture works on arbitrary words as well as words from a lexicon 6 Such et al. ICFHR‘18 High Level Symbol Vocabulary Language Length CNN CNN CNN Model CNN Predicts CNN Predicts FCN Predicts (optional step) word label for the number of 2 N +1 symbols, When block is common words symbols, then where each known to come such as ‘ his ’, resample block symbol is from a lexicon to 32 ☓ 16 N , ‘ her ’, ‘ the ’. If separated by a of words, use confidence > g , where N is the blank space. vocabulary then done! number of matching by symbols. minimizing character error rate. 7 Such et al. ICFHR‘18 3

  4. Vocabulary and Length CNNs Input pixels conv1a conv1b conv1c (64) (64) (64) 32 32 32 32 3×3 3×3 3×3 64 64 64 128 128 128 128 pool 1 conv2a conv3a (128) conv2b (256) pool 2 8 256 8 16 16 3×3 16 128 (128) 3×3 128 128 64 32 32 64 3×3 64 64 conv3b (256) 3×3 Conv4 FC For vocabulary, V=~1000 (512) pool 3 (V) For length, V=32 (but can be any value or regression) 8 256 4 256 4×16 32 16 512 V C(64,3,3)-C(64,3,3)-C(64,3,3)-P(2)-C(128,3,3)-C(128,3,3)-C(256,3,3)-P(2)- C(256,3,3)-C(512,3,3)-C(512,3,3)-P(2)-C(256,4,16)-FC(V)-SoftMax where C(D,H,W) stands for convolution with the dimensions of the filter as H ☓ W and the depth D. Each convolutional layer is followed by a batch norm and ReLU layer. P(2) represents a 2 ☓ 2 pooling layer with stride 2. 8 Such et al. ICFHR‘18 High Level Symbol Vocabulary Language Length CNN CNN CNN Model CNN Predicts CNN Predicts FCN Predicts (optional step) word label for the number of 2 N +1 symbols, When block is common words symbols, then where each known to come such as ‘ his ’, resample block symbol is from a lexicon to 32 ☓ 16 N , ‘ her ’, ‘ the ’. If separated by a of words, use confidence > g , where N is the blank space. vocabulary then done! number of matching by symbols. minimizing character error rate. 9 Such et al. ICFHR‘18 4

  5. Symbol FCN Context path Conv Conv Conv 3x3 3x3 3x3 3 1 8 128 4 25 2 64 pool 6 pool 16 6 64 pool 3 128 2 FC Relu x2 Tile FC 1024 Add (N s ) 1 N s 102 3 ReLU 2N+1 ReLU 4 2N+ 2N+ Predictions 1 1 3 102 4 2N+ Conv 1 4x4 1x2 pad Conv Conv Conv 3x3 3x3 3x3 x2 x2 x3 3 1 8 4 512 128 256 2 6 pool pool 16/2 pool 64/8 32/4 128/16 N N N N Symbol detail path 11 Such et al. ICFHR‘18 Symbol FCN (1024) (N s ) Conv FullyConv 4x4x512 3x1x1024 1x2 pad softmax 1x2 pad 4 N s 3 1 512 1024 2N+1 2N 2N+1 2N+1 Predictions N Input 2N+1 Predicted Symbols Symbols • Vertical pad gives N=1 N=3 forgiveness for up/down- can think as three estimates for each prediction. Activation maps 2N wide • Horizontal pad gives Pad of 2 on left/right 2N+1 outputs. Conv filter of width 4 12 Such et al. ICFHR‘18 5

  6. Symbol FCN (1024) (N s ) Conv FullyConv 4x4x512 3x1x1024 1x2 pad softmax 1x2 pad 4 N s 3 1 512 1024 2N+1 2N 2N+1 2N+1 Predictions N Input 2N+1 Predicted Symbols Symbols • Vertical pad gives N=1 N=3 forgiveness for up/down- can think as three estimates for each prediction. N=9 N=4 • Horizontal pad gives 2N+1 outputs. 13 Such et al. ICFHR‘18 Symbol FCN (1024) (N s ) Conv FullyConv 4x4x512 3x1x1024 1x2 pad softmax 1x2 pad 4 N s 3 1 512 1024 2N+1 2N 2N+1 2N+1 Predictions • Vertical pad gives • Softmax over N s • Each of 2N+1 forgiveness for symbols. predictions are a up/down- can think linear combination as three estimates of 3x1024 for each prediction. activation map. • Horizontal pad gives 2N+1 outputs. 14 Such et al. ICFHR‘18 6

  7. Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 - - - - - i 2 - - - - - m 3 - - - - - e 4 - - - - - 15 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 ? - - - - i 2 - - - - - m 3 - - - - - e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 16 Such et al. ICFHR‘18 7

  8. Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 ? - - - - i 2 - - - - - m 3 - - - - - e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 17 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 - - - - Match! Pass i 2 - - - - - along previous m 3 - - - - - error e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 18 Such et al. ICFHR‘18 8

  9. Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 - - - - Miss! +1 To insert i i 2 1 - - - - m 3 - - - - - e 4 - - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 19 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 - - - - Miss! +1 To insert i 2 1 - - - - m, then e m 3 2 - - - - e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 20 Such et al. ICFHR‘18 9

  10. Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to delete i 2 1 - - - - y m 3 2 - - - - e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 21 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to replace i 2 1 1 - - - y with i m 3 2 - - - - e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 22 Such et al. ICFHR‘18 10

  11. Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to replace i 2 1 1 - - - y with m m 3 2 2 - - - or +1 to insert m e 4 3 - - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 23 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 - - - Miss, +1 to replace i 2 1 1 - - - y with e m 3 2 2 - - - or +1 to insert e e 4 3 3 - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 24 Such et al. ICFHR‘18 11

  12. Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 2 - - Miss, +1 to delete i 2 1 1 - - - m m 3 2 2 - - - e 4 3 3 - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 25 Such et al. ICFHR‘18 Predicted Word t y m m e 0 1 2 3 4 5 Comparison Word t 1 0 1 2 - - Miss, +1 to replace i 2 1 1 2 - - m with i m 3 2 2 - - - or +1 to delete y e 4 3 3 - - - ! ",$ = &'( ! ")*,$ + 1, ! ",$)* + 1, -'./ ! ")*,$)* '1 2345 6ℎ.3 = 68&2.34 6ℎ.3 -'./ = 0 ! ")*,$)* + 1 8/: 26 Such et al. ICFHR‘18 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend