Character Recognition Reporter: Zecheng Xie South China University - - PowerPoint PPT Presentation

character recognition reporter zecheng xie
SMART_READER_LITE
LIVE PREVIEW

Character Recognition Reporter: Zecheng Xie South China University - - PowerPoint PPT Presentation

Accelerating and Compressing LSTM based Model for Online Handwritten Chinese Character Recognition Reporter: Zecheng Xie South China University of Technology August 5 th , 2018 Outline Motivation Difficulties Our approach


slide-1
SLIDE 1

Accelerating and Compressing LSTM based Model for Online Handwritten Chinese Character Recognition Reporter: Zecheng Xie South China University of Technology

August 5th , 2018

slide-2
SLIDE 2
  • Motivation
  • Difficulties
  • Our approach
  • Experiments
  • Conclusion

2

Outline

slide-3
SLIDE 3

Motivation

3  Online handwritten Chinese character recognition (HCCR) is

widely used in pen input devices and touch screen devices

slide-4
SLIDE 4

Motivation

4  The difficulties of online HCCR

 Large number of character classes  Similarity between characters  Diversity of writing styles

 Deep learning models are powerful but raise other problems

 Models are too large  require large footprint and memory  Computational expensive  consume much energy

 The advantages of deploying models on mobile devices

 Ease server pressure  Better service latency  Can work offline  Privacy protection  …

Our goal: build fast and compact models for on-device inference

slide-5
SLIDE 5

Difficulties of deploying LSTM based

  • nline HCCR models on mobile devices

 3755 classes

 Model tends to be large

 Dependences between time steps

 Make the inference slow  Nature of RNNs, unlikely to be changed

5

[1] http://colah.github.io/posts/2015-08-Understanding-LSTMs/ Unroll of RNN [1]

slide-6
SLIDE 6

Our approach

 The proposed framework 6

The baseline model Reconstruct baseline with SVD Prune redundant connections Cluster remaining connections

slide-7
SLIDE 7

Our approach

 Data preprocessing and augmentation 7

  • Randomly remove 30% of the points in each character
  • Perform coordinate normalization
  • Remove redundant points using method proposed in [1]
  • Point that is too close to the point before it
  • Middle point that nearly stands in line with the two points before

and after it

  • Data transform & feature extraction[1]
  • 𝑦𝑗, 𝑧𝑗, 𝑡𝑗 ,

𝑗 = 1, 2, 3, …

  • 𝑦𝑗, 𝑧𝑗, ∆𝑦𝑗, ∆𝑧𝑗, 𝑡𝑗 = 𝑡𝑗+1 , (𝑡𝑗 ≠ 𝑡𝑗+1) ,

𝑗 = 1, 2, 3, …

[1] X.-Y. Zhang et al., “Drawing and recognizing Chinese characters with recurrent neural network”, TPAMI, 2017

slide-8
SLIDE 8

Our approach

 Data preprocessing and augmentation 8

[1] X.-Y. Zhang et al., “Drawing and recognizing Chinese characters with recurrent neural network”, TPAMI, 2017

slide-9
SLIDE 9

Our approach

 Baseline model architecture

 Input-100LSTM-512LSTM-512FC-3755FC-Output

9

t=1 t=T

input 100 LSTM 512 LSTM 512 FC 3755 FC

slide-10
SLIDE 10

Our approach

 Reconstruct network with singular value decomposition (SVD) 10 10

𝑗𝑢 = 𝜏(𝑋

𝑗𝑗𝑦𝑢 + 𝑋 ℎ𝑗ℎ𝑢−1 + 𝑐𝑗)

𝑔

𝑢 = 𝜏 𝑋 𝑗𝑔𝑦𝑢 + 𝑋 ℎ𝑔ℎ𝑢−1 + 𝑐𝑔

𝑕𝑢 = tanh 𝑋

𝑗𝑕𝑦𝑢 + 𝑋 ℎ𝑕ℎ𝑢−1 + 𝑐𝑕

𝑝𝑢 = 𝜏(𝑋

𝑗𝑝𝑦𝑢 + 𝑋 ℎ𝑝ℎ𝑢−1 + 𝑐𝑝)

𝑑𝑢 = 𝑔

𝑢 ∗ 𝑑𝑢−1 + 𝑗𝑢 ∗ 𝑕𝑢

ℎ𝑢 = 𝑝𝑢 ∗ tanh(𝑑𝑢) 𝑗𝑢 𝑔

𝑢

𝑕𝑢 𝑝𝑢 = 𝜏 𝜏 tanh 𝜏 * 𝑋

𝑗𝑗

𝑋

𝑗𝑔

𝑋

𝑗𝑕

𝑋

𝑗𝑝

𝑦𝑢+ 𝑋

ℎ𝑗

𝑋

ℎ𝑔

𝑋

ℎ𝑕

𝑋

ℎ𝑝

ℎ𝑢−1+ 𝑐𝑗 𝑐𝑔 𝑐𝑕 𝑐𝑝

Main computation

slide-11
SLIDE 11

Our approach

 Reconstruct network with singular value decomposition (SVD) 11 11

𝑗𝑢 𝑔

𝑢

𝑕𝑢 𝑝𝑢 = 𝜏 𝜏 tanh 𝜏 * 𝑋

𝑗𝑗

𝑋

𝑗𝑔

𝑋

𝑗𝑕

𝑋

𝑗𝑝

𝑦𝑢+ 𝑋

ℎ𝑗

𝑋

ℎ𝑔

𝑋

ℎ𝑕

𝑋

ℎ𝑝

ℎ𝑢−1+ 𝑐𝑗 𝑐𝑔 𝑐𝑕 𝑐𝑝 𝑋

𝑗𝑦𝑢

𝑋

ℎℎ𝑢−1

 Apply SVD to 𝑋

𝑗 and 𝑋 ℎ

 𝑋

𝑗: input connections

 𝑋

ℎ: hidden-hidden connections

slide-12
SLIDE 12

Our approach

 Efficiency analysis of SVD method 12 12

  • Suppose 𝑋 ∈ ℝ𝑛×𝑜, by SVD we have

𝑋

𝑛×𝑜 = 𝑉𝑛×𝑜Σ𝑜×𝑜𝑊 𝑜×𝑜 𝑈

  • By reserving proper number of singular values

𝑋

𝑛×𝑜 ≈ 𝑉𝑛×𝑠Σ𝑠×𝑠𝑊 𝑜×𝑠 𝑈

= 𝑉𝑛×𝑠𝑂𝑠×𝑜

  • Replace 𝑋

𝑛×𝑜 with 𝑉𝑛×𝑠𝑂𝑠×𝑜

  • 𝑋𝑦 → 𝑉𝑂𝑦
slide-13
SLIDE 13

Our approach

 Efficiency analysis of SVD method 13 13

  • For a matrix-vector multiplication 𝑋𝑦, 𝑋 ∈ ℝ𝑛×𝑜, 𝑦 ∈ ℝ𝑜×1, the

acceleration rate and compression rate with r singular values reserved is given by

𝑆𝑏 = 𝑆𝑑 = 𝑛𝑜 𝑛𝑠 + 𝑠𝑜

  • If 𝑛 = 512, 𝑜 = 128, 𝑠 = 32, then 𝑆𝑏 = 𝑆𝑑 = 3.2
slide-14
SLIDE 14

Our approach

 Adaptive drop weight (ADW) [1]

 Improvement on “Deep Compression” [2] in which a hard threshold is set  ADW gradually prunes away redundant connections in each layer, which

have small absolute values (by sort them during retraining)

 After ADW, the network become sparse, K-means based

quantization is applied to each layer to gain further compression

14 14

[1] X. Xiao, L. Jin, et al., “Building fast and compact convolutional neural networks for offline handwritten Chinese character recognition”, Pattern Recognition, 2017 [2] S. Han,et al., “Deep compression: compressing deep neural network with pruning, trained quantization and Huffman coding”, ICLR, 2016

slide-15
SLIDE 15

Our approach

 The proposed framework - review 15 15

The baseline model Reconstruct baseline with SVD Prune redundant connections Cluster remaining connections

slide-16
SLIDE 16

Experiments

 Training set

 CASIA OLHWDB1.0 & OLHWDB1.1  720 writers, 2,693,183 samples, 3755 classes

 Test set

 ICDAR2013 online competition dataset  60 writers, 224,590 samples, 3755 classes

 Data preprocessing and augmentation as mentioned before 16 16

slide-17
SLIDE 17

Experiments

 Details of the baseline model 17 17  Main storage cost: LSTM2, FC1, FC2  Main computation cost: LSTM2

slide-18
SLIDE 18

Experiments

 Experimental settings 18 18  Consideration of the experimental settings

 In our experiments, we found LSTM is more sensitive to input connections

than hidden-hidden connections

 Most computation latency is introduced by hidden-hidden connections

slide-19
SLIDE 19

Experiments

 Experimental results

 Intel Core i7-4790, single thread

19 19

After SVD, model is 10 × smaller, and FLOPs is also reduced by 10 ×

After ADW & quantization, model is 31 × smaller, and FLOPs is further reduced

A minor 0.5% drop of accuracy

slide-20
SLIDE 20

Experiments

 Experimental results 20 20

 Compared with [11], our model is 300 × smaller and 4 × faster on CPU  Compared with [15], our model is 52 × smaller and 109 × faster on CPU

[1] W. Yang, L. Jin, et al., “Dropsample: A new training method to enhance deep convolutional neural networks for largescale unconstrained handwritten Chinese character recognition”, Pattern Recognition, 2016 [2] X.-Y. Zhang, et al., “Online and offline handwritten Chinese character recognition: A comprehensive study and new benchmark”, Pattern Recognition, 2017

slide-21
SLIDE 21

Conclusion

21 21

  • SVD is efficient for accelerating computation
  • ADW also works well for LSTMs
  • By combining SVD and ADW, we can build fast and

compact LSTM based model for online HCCR

slide-22
SLIDE 22

Thank you!

22 22

Lianwen Jin(金连文), Ph.D, Professor eelwjin@scut.edu.cn lianwen.jin@gmail.com Zecheng Xie(谢泽澄), Ph.D, student Yafeng Yang(杨亚锋), Master, student http://www.hcii-lab.net/