handwritten chinese text recognition
play

Handwritten Chinese Text Recognition Wenchao Wang, Jun Du and Zi-Rui - PowerPoint PPT Presentation

Parsimonious HMMs for Offline Handwritten Chinese Text Recognition Wenchao Wang, Jun Du and Zi-Rui Wang University of Science and Technology of China ICFHR 2018, Niagara Falls, USA, Aug. 5-8, 2018 1 Background Offline handwritten Chinese


  1. Parsimonious HMMs for Offline Handwritten Chinese Text Recognition Wenchao Wang, Jun Du and Zi-Rui Wang University of Science and Technology of China ICFHR 2018, Niagara Falls, USA, Aug. 5-8, 2018 1

  2. Background • Offline handwritten Chinese text recognition (OHCTR) is challenging – No trajectory information in comparison to the online case – Large vocabulary of Chinese characters – Sequential recognition with the potential segmentation problem • Approaches – Oversegmentation approaches – Character oversegmentation /classification – Segmentation-free approaches – GMM-HMM: Gaussian mixture model - hidden Markov model – MDLSTM-RNN: Multidimensional LSTM-RNN + CTC – DNN-HMM: Deep neural network – hidden Markov model 2

  3. Review of HMM Approach for OHCTR • Left-to-right HMM is adopted to represent Chinese character. • The character HMMs are concatenated to model the text line. The sequence of concatenated character HMMs 映 反 得 到 The observation sequence of sliding windows 3

  4. Review of DNN-HMM Approach for OHCTR • The Bayesian framework Character modeling Output distribution DNN to calculate state posterior probability 4

  5. Motivation • High demand of memory and computation from DNN output layer • Model redundancy due to similarities among different characters • Parsimonious HMMs to address these two problems • Decision tree based two-step approach to generate tied-state pool 5-state HMM for 5-state HMM for 5-state HMM for character 冻 character 缴 character 练 ... ... Tied-state Pool 5

  6. Binary Decision Tree for State Tying • The parent set has a distribution O 1 , the total log-likelihood of all P x 1 ( ) observations in on the distribution O 1 =  O1,P1(x) of is: L O ( ) log( P x ( )) P x 1 ( ) 1  1 x O 1 • The child set has a distribution O One question 2 2 ( ) P x , the total log-likelihood of all observations in on the distribution O 2 =  of is : L O ( ) log( P x ( )) P x 2 ( ) 2  2 x O 2 O2,P2(x) O3,P3(x) • The child set has a distribution O 3 P x 3 ( ) , the total log-likelihood of all = observations in on the distribution O O O O 3 1 2 3 =  of is : L O ( ) log( P x ( )) P x 3 ( )  3 3 x O 3 • The total increase in set-conditioned log-likelihood of observations due to + − ( ) ( ) ( ) partitioning is: L O L O L O 2 3 1 6

  7. Step 1: Clustering Characters with Decision Tree Is in 愧 怀 怳 忧 快 忱 恍 恢 悦 惋 惯 ? • All states with the same HMM position are initially grouped Yes No together at the root node. Is in 愧 怳 忱 恢 悦 惋 惯 ? Is in 愉 愤 懈 怖 惝 ? • Each node is then recursively Yes No Yes No partitioned to maximize the Is in 慎 懂 性 恼 惊 ? increase in expected log-likelihood with question set. Yes No Is in 慎 懂 ? Yes No Leaf node • All states in the leaves of the Non-leaf node decision tree are tied together. A tree fragment for tying the first state of HMM 7

  8. Step 2: Bottom-up Re-clustering • In the second step, the clusters Decision in leaf nodes obtained in the Tied-state Tree ... ... Pool Yes No first step is re-clustered by a bottom-up procedure using Yes No Yes No sequential greedy optimization. ... 3. Generate 1 2 i n Tied-state pool Tied-state leaf nodes • The expected log-likelihood 2. If #cluster > N: calculate objf decrease by clustering decrease by combining every clusters, recluster two cluster two clusters is calculated. 1. Calculate the with the minimum objf objf decrease by decrease to a new cluster. clustering each • two leaf nodes, A minimum priority queue is push these to this cluster cluster cluster ... ... queue. maintained to re-cluster the two (i,j) (m,n) (k,l) clusters with minimum Minimum Priority Queue log-likelihood decrease to a new cluster. 8

  9. Training Procedure for Parsimonious HMMs 1. Training conventional GMM-HMM system 2. Calculating the first-order and second-order statistics based on state-level forced-alignment 3. Two-step algorithm: First-step : Building the state-tying tree Second-step : Re-clustering the tied-states based on the first-step 4. Parsimonious GMM-HMMs training based on the tied states 5. Parsimonious DNN-HMMs training based on the tied states 9

  10. Experiments • Training set CASIA-HWDB database including HWDB1.0, HWDB1.1, HWDB2.0-HWDB2.2 • Test set ICDAR-2013 competition set. • Vocabulary: 3980 character classes • GMM-HMM system – Each character modeled by a left-to-right HMM with 40-component GMM – Gradient-based features followed by PCA to obtain a 50-dimensional vector • DNN-HMM system – 350-2048-2048-2048-2048-2048-2048-3980*N • DNN-PHMM system – 350-2048-2048-2048-2048-2048-2048-M 10

  11. HMM vs. PHMM • Performance saturation with the increase of states for each character • PHMM outperforming HMM with the same setting of tied-state number • Parsimoniousness of the best PHMM compared with the best HMM • Demonstrating the reasonability of the proposed state tying algorithm 11

  12. HMM vs. PHMM • Much more compact by setting the number of tied-states per character < 1 • DNN-PHMM (Ns=0.5, 9.52%) outperforming DNN-HMM (Ns=1, 11.09%) 12

  13. Memory and Computation Costs DNN-PHMM using (1024, 4) setting achieved a comparable CER with DNN- HMM using (2048, 6) setting, 75% of model size and 72% of run-time latency were reduced in DNN-PHMM compared with DNN-HMM. 13

  14. State Tying Result Analysis Tied Radical Similar characters structure part 喷 喻 嗅 嗡 吃 咆 哦 哨 嘈 嘲 噬 嚼 口 Left-right 客 害 容 密 寇 蜜 穷 穿 突 窃 窍 窑 宀 Top-bottom 口 圃 圆 囚 囤 困 围 固 Surround 巨 匝 匠 匡 匣 匪 匹 医 匿 臣 匚 Left-surround 诞 巡 边 逊 辽 达 谜 迁 迂 过 近 这 辶 Bottom-left-surround 澜 阐 阑 鬲 闸 闻 闽 润 门 Top-surround 串 吊 甲 牢 帛 早 平 | Cross 氛 氢 氦 氨 气 Top-right-surround The Chinese characters with the same or similar radicals were easily tied using the proposed algorithm. This is the reason that the proposed DNN-PHMM with quite compact design can still maintain high recognition performance. 14

  15. Thanks! 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend