浅谈文字识别:新观察、 新思考、新机遇
金连文
华南理工大学
2019年10月16日
2019 10 16 Outline - - PowerPoint PPT Presentation
2019 10 16 Outline Introduction Recent Progress and Trends Scene Text Detection Scene Text Recognition End2End Scene Text
2019年10月16日
3
生活中文字图像无处不在
4
5
6
给一张图, 如果上面有文字, 绝大多数情况下, 图中的文字最有信息量!
7
– 场景文字检测与识别(Scene Text Recognition, STR)
8
在线文字识别(Online HCR) 联 机 手 写 文 字 识 别 在 线 签 名 及 笔 迹 识 别 联 机 数 学 公 式 识 别 … 光学字符识别(OCR) 手 写 体 文 字 识 别 印 刷 体 文 字 识 别 字符 文本行 篇幅 复杂 版面 … 报刊书籍 扫描文档 证照车牌 表单名片 … … 场景文字
文字检测 文字识别 字符/词/文本行 端到端(End-to-End)识别
9
10
1.
Arbitrarily oriented
2.
Irregular text, perspective distortion
3.
Scale diversity
4.
Ambiguity of annotation Char, Word,Text, Label sequence order
5.
Completeness and tightness IoU>=0.5 ?
6.
Arbitrary variation of text appearances
7.
Different types of imaging artifacts
8.
Complicated image background
9.
Uneven lighting
11
12
场景文字检测方法举例:
:
基于回归的方法
基于分割的方法
混合方法(分割+回归)
发展趋势:
:
水平矩形框检测 多方向矩形框 多方向四边形 曲线文本 任意形状 Segmentation based的方法不容易准确区分相邻或重叠文本 Regression based 的方法对长文本不易检测完整
13
14
YC Xu, et.al, TextField- Learning A Deep Direction Field for Irregular Scene Text Detection, IEEE TIP 2019.
– a two-dimensional unit vector that points away from its nearest text boundary pixel
15
CVPR 2019 (Oral).
16
曲线文本检测)等新模块
17
Similar vectors)
Aggregation method W Wang, E Xie, et al., Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network, ICCV 2019
多尺度问题(large scale variations )
images in the MSRA-TD500” -- C Xue, et al., IJCAI 2019
18
Learning geometric properties of text/char/pixel regions, eg:
Char/text center line ; Char/text border offset Char/text center offset; Char/text vertex offset Character affinity, Corner point Visual relationship …
19
M Yang, ScRN, ICCV 2019 (识别)
S Long,et al.,TextSnake, ECCV 2018 P Wang, et.al, SAST, MM 2019
Anchor free 回归方法举例:
Why anchor free? 大多数RPN regression 方法需要设置合理的anchors参数
Alternative anchor design?
step toward more general object detection, AAAI 2019.
21
anchor strings based on length instead of IoU.
without any modifications of hyperparameters or specialized optimization.
Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019.
– Experimental Data
– To verify the robustness & adaptivity, we maintained the same hyperparameters for DeRPN throughout all of our experiments without any modifications
24
– range of [𝟗 𝒓, 𝟐𝟏𝟑𝟓 𝒓]
– The largest deviation (ratio) between the anchor string and object edge is at most
𝒓
– Higher recall rate – Tighter bounding box
– Can only deal with rectangle Bbox – For two-stage framework only
25
Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019. Code: https://github.com/HCIILAB/DeRPN
26
Char, Word or Line Label sequence order
not solved this problem completely.
Solution A (CVPR17) Solution B (TIP18) Solution C (ACM MM18)
https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA
28
https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA
Yuliang Liu, el.al., IJCAI 2019.
29
Discretization, IJCAI 2019.
30
hyperparameter (不用调参,训练3小时左右,达到STOA) 以此模型为基础,获得 ICDAR 2019 ReCTS 检测任务冠军
Yuliang Liu, et al., Omnidirectional Scene Text Detection with Sequential-free Box Discretization, IJCAI
32
33
场景文字识别方法举例:
基于CTC CTC的方法
基于Attention的方法
规则文本 不规则文本识别 CTC Attention (1D, 2D) 检测+识别 检测识别端到端
近期发展趋势:
1.
Perspective distortion, irregular, arbitrarily oriented
2.
Text line curvature
3.
Arbitrary variation of text appearances / styles
eg Calligraphic fonts Hybrid horizonal/vertical texts
4.
Different types of imaging artifacts
5.
Uneven lighting
6.
Complicated image background (eg. 多余背景问题)
7.
Low resolution
8.
Scale diversity
9.
Heavy occlusions
34
不规则文字识别的解决思路举例:
基于矫正的方法:eg: RARE、ASTER 、MORAN、ESIR
IEEE TPAMI 2018.已开源
Recognition,2019.已开源
CVPR 2019
基于二维Attention的方法, eg:
基于字符级识别解决, eg: Char-Net, Mask TextSpotter
Shapes, ECCV 2018, IEEE TPAMI 2019 已开源 35
36
B Shi, et.al., ASTER- An Attentional Scene Text Recognizer with Flexible Rectification , IEEE TPAMI 2018.
TPS+CNN+BLSTM+Attention
37 M Yang, et al., Symmetry-constrained Rectification Network for Scene Text Recognition , ICCV 2019 F Zhan, S Lu, et al., ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification, CVPR 2019
38
Pattern Recognition, 2019. Code: https://github.com/Canjie-Luo/MORAN_v2
text with attention mechanisms. IJCAI, 2017.
strong baseline for irregular text recognition, AAAI 2019
AAAI 2019.
Cross Attention, ICDAR 2019
39
YC Wu, F Yin, XY Zhang, L Liu, CL Liu, SCAN: Sliding Convolutional Attention Network for Scene Text Recognition, arXiv 2018.
encoder
decoder
training
windows
P Wang, L Yang , et al., Simple and Robust Convolutional- Attention Network for Irregular Text Recognition, arXiv 2019
recognition
42
probability distribution map path transition map 速度快,能解决曲线文本识别问题
F Cong, W Hu, Q Huo, L Guo, A Comparative Study of Attention-Based Encoder- Decoder Approaches to Natural Scene Text Recognition, ICDAR 2019
43
H Ding, K Chen, et al., ICDAR 2017
CTC:
Can hardly be directly applied to 2D prediction Large computation involved for long sequence Performance degradation for repeat patterns
Attention:
Misalignment problem (attention drift) More memory size required
44
Aggregation cross-entropy (ACE)
distributions over all the classes;
45
Z Xie, et al., Aggregation Cross-Entropy for Sequence Recognition, CVPR 2019 (oral) Code: https://github.com/summerlvsong/Aggregation-Cross-Entropy
46
Prevent training errors be accumulated errors can accumulate in a cascade of detection + recognition which may lead to a large fraction of garbage predictions Jointly optimization to help improve overall performance Easier to maintain and adapt to new domain maintaining a cascaded pipeline with data and model dependencies requires substantial engineering effort Faster, Smaller, Stronger
47
Xuebo Liu et al., FOTS: Fast Oriented Text Spotting with a Unified Network, CVPR 2018
– 从左到右; 从上到下…
– 训练样本需求不同、训练难度及时间不同
48
– X Liu, et al., FOTS, CVPR 2018
– T. He, et al., TextSpotter, CVPR 2018
– H Li et al., Towards End-to-End Text Spotting in Natural Scenes, arXiv 20190617 (extension of “H Li et al., ICCV 2017”)
(任意形状e2e) – Y Sun, et al., TextNet, ACCV 2018
– M Liao, P Lyu, et al, Mask TextSpotter, TPAMI 2019
– S Qin, A Bissacco, et al. (Google AI), Towards Unconstrained End-to- End Text Spotting, ICCV 2019
49
50
网络架构
的维度的扩展,实现端到端地检测文本并 识别出文本的内容。
Mask 分支
Pengyuan Lyu*, Minghui Liao*, et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes, ECCV 2018. TPAMI 2019
Mask R-CNN framework + Attention Decoder RoI Masking 使用了百万级外部数据: OCR引擎输出作为标注(partially labelling)
结果与别的方法不具有可比性
51
S Qin, A Bissacco, et al. (Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019
52
即使没有学习过,但大部分人仍然能准确
识别左图的文字
药剂师大都可以轻松识别下图的药方 Generalization ability? NLP + OCR ?
53
Zero-shot Learning. 猜猜刻的是什么字?
54
55
Accepted by ICCV 2019 (oral)
56
Evaluation Protocol for Scene Text Detection, CVPR 2019. Code:
https://github.com/Yuliang-Liu/TIoU-metric
– CY Lee, Y Baek, H Lee, TedEval: A Fair Evaluation Metric for Scene Text Detectors, arXiv 201907 – HS Lee, Y Yoon, et al., PopEval- A Character-Level Approach to End-To-End Evaluation Compatible with Word-Level Benchmark Dataset, ICDAR 2019 Workshop
Performance of deep learning based approaches are highly
Do we have enough data?
17/19, RCTW, SCUT-CTW1500, TotalText, MTWI, LSVT, ReCTS, ArT… What’s next good and challenge dataset? TextNet-1M? 1 million text images that contains:
57
Synth90k (MJSynth)
text recognition. NIPS2014 SynthText A. Gupta et al. Synthetic data for text localisation in natural images. CVPR2016 Verisimilar Zhan et al. Verisimilar Image Synthesis for Accurate Detection and Recognition
SF-GAN F. Zhang, H. Zhu, S. Lu,Spatial Fusion GAN for Image Synthesis, CVPR 2019. SynthText3D M. Liao, et al., SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds, arXiv 20190713
58
59
合成的数据看起来还是有点假?
60
irregular text recognition, AAAI 2019
IIIT5K SVT IC13 IC15 SVTP CT80 COCO-T OnlySynth 91.5 84.5 91.0 69.2 76.4 83.3
95.0 91.2 94.0 78.8 86.4 89.6 66.8
ICDAR 2019 LSVT
部分标注数据 40万 (海量!) Train: 30000; Test: 20000
ICDAR 2019 ArT
曲线文本、不规则形状 Train: 5603; Test: 4563
MLT 2019
多语言:10种语言 Train: 10000; Test: 10000
ICDAR 2019 ReCTS
街景招牌; 60万字符级标注样本 Train: 20000; Test: 5000
ICDAR 2019 ST-VQA
Text VQA
61
We have received an overwhelming number of submissions
A total of 78 submissions from 46 unique teams/individuals were received
The top performing scores of each task
i) T1 : 82.65% (detection, H-mean) ii) T2.1 : 74.30% (recognition, Latin script only, word accuracy) iii) T2.2: 85.32% (recognition, Latin & Chinese, 1-NED) iv) T3.1 : 53.86% (End-to-end text spotting , Latin script only, 1-NED) v) T3.2 : 54.91% (End-to-end text spotting , Latin & Chinese, 1-NED)
End-to-end text spotting seems to be the most challenging task (1-
NED<55%)
There are still much room and research opportunity for further
improvement
esp. recognition; end-to-end text spotting
62
63
工业界的实际场景 学术界定义的场景
64
工业界的实际场景 学术界定义的场景
Chargrid Model for Document Understanding
65
Information extraction of Invoice Document
AR Katti, et al, Chargrid: Towards Understanding 2D Documents, EMNLP 2018.
66
Document structure & entity extraction
eg. “Key-Value” issue
X Liu, F Gao, et al, Graph Convolution for Multimodal Information Extraction…, NAACL 2019.
answering…, CVPR 2018.
ICDAR 2019
Question Answering
68
Dataset: https://github.com/HCIILAB/Scene-Text-Removal Code: coming soon…
08 08
69
L Wu, C Zhang et al, Editing Text in the Wild, ACM MM 2019.
SRNet: Style retention network
08 08
70
Semi- or weakly- supervised learning
X Qin, Y Zhou, et al., Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning, ICDAR 2019 YP Sun, et al., Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning, ICCV 2019. S Qin, A. Bissacco, Towards Unconstrained End-to-End Text Spotting, ICCV 2019
Z Xie, et al., Weakly supervised precise segmentation for historical document images, Neurocomputing, 2019.
Adversarial Learning, eg.:
AK Bhunia, et al., Handwriting Recognition in Low-resource Scripts using Adversarial Learning, CVPR 2019
71
Robustness
Adaptation, worst-case robustness, Self exploration, Withstand adversarial attack, …
Specification
Bias of data, alignment between model and human preferences…
Assurance
Monitioring/controlling of system, Interpretability
Confidence / out-of-distribution detection (OOD) issues
Learning, CVPR 2018
the Training Data and How to Mitigate the problem, CVPR 2019.
SafeML ICLR 2019 Workshop.
know, arXiv 201909. (知道自己不知道…)
SafeML ICLR 2019 Workshop, https://sites.google.com/view/safeml-iclr2019
72
金连文 (JIN Lianwen)
eelwjin@scut.edu.cn lianwen.jin@gmail.com
73