[PPT] - 2019 10 16 Outline PowerPoint Presentation, free download

SLIDE 1

浅谈文字识别：新观察、新思考、新机遇

金连文

华南理工大学

2019年10月16日

SLIDE 2

 Discussion and Future Prospects

Outline

 Introduction  Recent Progress and Trends

Scene Text Recognition
End2End Scene Text Recognition
Scene Text Detection

SLIDE 3

文字是信息交流及感知世界最重要的载体

3

生活中文字图像无处不在

SLIDE 4

离开文字，我们很难理解世界

4

SLIDE 5

离开文字，有时候我们很难理解图像

5

SLIDE 6

文字使我们的生活变得丰富多彩…

6

SLIDE 7

文字的重要性

给一张图，如果上面有文字，绝大多数情况下，图中的文字最有信息量！

7

SLIDE 8

DAR、 OCR、 STR

文档图像分析与识别 (Document Analysis & Recognition, DAR)
光学字符识别（Optical Character Recognition， OCR）

– 场景文字检测与识别（Scene Text Recognition, STR）

在线文字识别（Online Handwritten Character Recognition, Online HCR)

8

在线文字识别（Online HCR）联机手写文字识别在线签名及笔迹识别联机数学公式识别 … 光学字符识别（OCR）手写体文字识别印刷体文字识别字符文本行篇幅复杂版面 … 报刊书籍扫描文档证照车牌表单名片 … … 场景文字

SLIDE 9

 文字检测  文字识别  字符/词/文本行  端到端（End-to-End）识别

9

场景文字检测与识别 (STR)

SLIDE 10

10

SLIDE 11

1.

Arbitrarily oriented

2.

Irregular text, perspective distortion

3.

Scale diversity

4.

Ambiguity of annotation  Char, Word，Text, Label sequence order

5.

Completeness and tightness  IoU>=0.5 ?

6.

Arbitrary variation of text appearances

7.

Different types of imaging artifacts

8.

Complicated image background

9.

Uneven lighting

10. Low resolution
11. Heavy overlay
12. Long text detection

11

Challenge of Scene Text Detection

SLIDE 12

12

Scene Text Detection

 场景文字检测方法举例:

:

 基于回归的方法

Gupta et al., CVPR 2016; Tian et al., ECCV 2016
Shi, Bai et al., CVPR 2017, Liu et al, CVPR 2017
Liao et al., AAAI 2017, Hu et al, ICCV 2017 …

 基于分割的方法

Zhong et al., CVPR 2016; Zhou et al, CVPR 2017;
Wu et al, ICCV 2017; Deng et al, AAAI 2018;
X Li, CVPR 2019; W Wang, et al., CVPR 2019

 混合方法（分割+回归）

He et al, ICCV 2017; Lyu et al, CVPR 2018
Liao et al, CVPR 2018; Long et al, ECCV 2018; …
Liu et al., IJCAI 2019； …

 发展趋势:

:

水平矩形框检测多方向矩形框多方向四边形曲线文本任意形状  Segmentation based的方法不容易准确区分相邻或重叠文本  Regression based 的方法对长文本不易检测完整

Bounding box regression 方法需要设置合理的anchor参数

SLIDE 13

Direct Regression Net

13

C. He, et.al, Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression, TIP 2018.

SLIDE 14

TextField

14

YC Xu, et.al, TextField- Learning A Deep Direction Field for Irregular Scene Text Detection, IEEE TIP 2019.

Text Directional Field

– a two-dimensional unit vector that points away from its nearest text boundary pixel

Instance-level representation
Easy to separate adjacent text instances
Post processing is applied to produce the final detection result

SLIDE 15

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

15

X. Wang, et.al, Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation,

CVPR 2019 (Oral).

Adaptive representation of text region
CNN + LSTM

SLIDE 16

LOMO： Look more than once

16

C. Zhang, et.al, Look More Than Once- An Accurate Detector for Text of Arbitrary Shape, CVPR 2019.
解决长文本检测问题、曲线文本检测问题
设计了IRM（解决长文本检测）及SEM（解决

曲线文本检测）等新模块

无需复杂的后处理，端到端可训练

SLIDE 17

PAN

17

PSENet (CVPR 2019) 团队新作
速度快，性能好
解决密集长文本检测、任意曲线文本检测
Semantic Segmentation (Text region, Kernel,

Similar vectors)

Text Instance Rebuilding wit Learnable Pixel

Aggregation method W Wang, E Xie, et al., Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network, ICCV 2019

SLIDE 18

 多尺度问题（large scale variations ）

“The scale ratio between the largest and the smallest texts is up to 230 times for

images in the MSRA-TD500” -- C Xue, et al., IJCAI 2019

E Richardson, et al., It’s All About The Scale! arXiv 2019.
C Xue, et al., MSR- Multi-Scale Shape Regression for STD, IJCAI 2019
YX Wang, et al., DSRN: A deep scale relationship network for …, IJCAI 2019.
W Wang, et al., PSENet, CVPR 2019
W He, Realtime multi-scale scene text detection…, PR 2019.

18

其它一些细节方面的发展趋势（1）

SLIDE 19

 Learning geometric properties of text/char/pixel regions, eg：

 Char/text center line ；  Char/text border offset  Char/text center offset；  Char/text vertex offset  Character affinity,  Corner point  Visual relationship …

eg ECCV 2018(TextSnake)，CVPR 2019 (LOMO, CRAFT),
IJCAI 2019 (MSR）, TIP 2018 (TextField)，
ACM MM2019 (SAST)， ICCV 2019 (ScRN)
C Ma, Z Zhong, ICDAR 2019
…

19

其它一些发展趋势（2）

SLIDE 20

文字几何属性的应用

M Yang, ScRN, ICCV 2019 (识别）

Y. Baek, et.al, CRAFT, CVPR 2019

S Long,et al.,TextSnake, ECCV 2018 P Wang, et.al, SAST, MM 2019

SLIDE 21

 Anchor free 回归方法举例：

Segmentation based methods
C. He, et.al, Direct Regression…, ICCV 2017, TIP 2018.
Z Zhong et al., An Anchor-Free Region Proposal Network …, IJDAR 2019.
Zhi Tian, Chunhua Shen, et. al. FCOS, CVPR, 2019
Chenchen Zhu, Yihui He, et. al. FSAF, CVPR, 2019
Tao Kong, Fuchun Sun et. al. FoveaBox, arXiv 2019

 Why anchor free?  大多数RPN regression 方法需要设置合理的anchors参数

Eg: SSD  TextBox (AAAI 2017)

 Alternative anchor design?

Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further

step toward more general object detection, AAAI 2019.

21

（3）Anchor & RPN 调参问题

SLIDE 22

DeRPN

Dimension-decomposition region proposal network (DeRPN).
DeRPN utilizes an anchor string mechanism. Match the objects with

anchor strings based on length instead of IoU.

DeRPN can be employed directly on different models, tasks, and datasets

without any modifications of hyperparameters or specialized optimization.

Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019.

SLIDE 23

Experiments

– Experimental Data

PASCAL VOC 2007
PASCAL VOC 2012
MS COCO
ICDAR 2013
MS COCO Text

– To verify the robustness & adaptivity, we maintained the same hyperparameters for DeRPN throughout all of our experiments without any modifications

同一个模型胜任不同任务不同数据集，不用任何调参

SLIDE 24

Experiments

24

SLIDE 25

Summary of DeRPN

Good adaptivity and generalization ability
Able to detect objects of variant size

– range of [𝟗 𝒓, 𝟐𝟏𝟑𝟓 𝒓]

Regression loss of DeRPN is bounded

– The largest deviation (ratio) between the anchor string and object edge is at most

𝒓

Better performance

– Higher recall rate – Tighter bounding box

(better performance for high IoU)
Limitation

– Can only deal with rectangle Bbox – For two-stage framework only

25

Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019. Code: https://github.com/HCIILAB/DeRPN

SLIDE 26

（4）标注歧义性问题

26

Char, Word or Line Label sequence order

SLIDE 27

Sensitive to Label Sequence (SLS) issue

Existed methods addressed but did

not solved this problem completely.

Solution A (CVPR17) Solution B (TIP18) Solution C (ACM MM18)

https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA

SLIDE 28

Our Approach: Sequential-invariant Box Discretization (SBD)

28

https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA

Yuliang Liu, el.al., IJCAI 2019.

SLIDE 29

Box Discretization Network for Omni-directional Object detection

29

Y. Liu, S. Zhang, L. Jin, el.al., Omnidirectional Scene Text Detection with Sequential-free Box

Discretization, IJCAI 2019.

SLIDE 30

Experimental Results

30

Our results were produced through single scale testing

SLIDE 31

Generalization Ability of BDN

Ship detection using the same model without modification of any

hyperparameter （不用调参，训练3小时左右，达到STOA) 以此模型为基础，获得 ICDAR 2019 ReCTS 检测任务冠军

Yuliang Liu, et al., Omnidirectional Scene Text Detection with Sequential-free Box Discretization, IJCAI

2019. Code: https://github.com/Yuliang-Liu/Box_Discretization_Network

SLIDE 32

32

SLIDE 33

33

Scene Text Recognition

 场景文字识别方法举例:

:

 基于CTC CTC的方法

P. He et al., AAAI 2016 (DTRN: CNN+RNN+CTC)
B. Shi et al. , TPAMI 2017 (CRNN: CNN+RNN+CTC)
F Yin, et al., arXiv 2017 (CNN+CTC)
Y Wu, et al., arXiv 218 (CNN+CTC)
Y. Liu et al., ECCV 2018 （GAN+CTC)
…

 基于Attention的方法

C Lee et al., CVPR 2016; B. Shi et al., CVPR 2016
X. Yang et al. IJCAI 2017
Bai et al. , CVPR 2018; Liu et al., AAI 2018
Shi et al., TPAMI 2018 （ASTER);
Luo et al, PR 2019 （MORAN);
Li et.al, AAAI 2019…

规则文本不规则文本识别 CTC Attention (1D, 2D) 检测+识别检测识别端到端

 近期发展趋势:

:

SLIDE 34

1.

Perspective distortion, irregular, arbitrarily oriented

2.

Text line curvature

3.

Arbitrary variation of text appearances / styles

 eg Calligraphic fonts  Hybrid horizonal/vertical texts

4.

Different types of imaging artifacts

5.

Uneven lighting

6.

Complicated image background （eg. 多余背景问题）

7.

Low resolution

8.

Scale diversity

9.

Heavy occlusions

10. Similar and confusable characters

34

Challenge of Scene Text Recognition

SLIDE 35

 不规则文字识别的解决思路举例：

 基于矫正的方法：eg: RARE、ASTER 、MORAN、ESIR

X. Yang, D. He, et al. Robust scene text recognition with automatic rectification, CVPR 2016
Shi B, Yang M, Wang X, et al. ASTER: An attentional scene text recognizer with flexible rectification，

IEEE TPAMI 2018.已开源

C. Luo, L. Jin, et.al, “MORAN: A multi-object rectified attention network for scene text recognition,” Pattern

Recognition，2019.已开源

Fangnang Zhan, Shijian Lu, ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification,

CVPR 2019

 基于二维Attention的方法, eg：

Yang et al. “Learning to Read Irregular Text with Attention Mechanisms.” IJCAI. 2017
Li et al. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition, AAAI.2019
M. Liao, et.al, Scene Text Recognition from Two-Dimensional Perspective. AAAI 2019
P Wang, et.al., A Simple and Robust Convolutional-Attention Network for Irregular Text Recognition, ICCV
2019. （2D CNN Attention)

 基于字符级识别解决, eg: Char-Net, Mask TextSpotter

W Liu, etal, Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition, AAAI 2018
M. Liao et al., Mask TextSpotter- An End-to-End Trainable Neural Network for Spotting Text with Arbitrary

Shapes, ECCV 2018， IEEE TPAMI 2019 已开源 35

解决好上述某些Issues带来的成果举例

SLIDE 36

ASTER

36

B Shi, et.al., ASTER- An Attentional Scene Text Recognizer with Flexible Rectification , IEEE TPAMI 2018.

TPS+CNN+BLSTM+Attention

SLIDE 37

ASTER的后续改进举例：ESIR，ScRN

37 M Yang, et al., Symmetry-constrained Rectification Network for Scene Text Recognition , ICCV 2019 F Zhan, S Lu, et al., ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification, CVPR 2019

SLIDE 38

MORAN: 像素级矫正方法

38

C. Luo, L. Jin, et.al., MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition,

Pattern Recognition, 2019. Code: https://github.com/Canjie-Luo/MORAN_v2

SLIDE 39

2D Attention、Cascade Attention

X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles. Learning to read irregular

text with attention mechanisms. IJCAI, 2017.

Liu et al. Char-net: A character-aware neural network for distorted scene text
recognition. AAAI. 2018.
H. Li, P. Wang, C. Shen, and G. Zhang, Show, attend and read: A simple and

strong baseline for irregular text recognition, AAAI 2019

M. Liao, et al., Scene Text Recognition from Two-Dimensional Perspective,

AAAI 2019.

Y Huang, et al., Attention after attention: Reading Text in the Wild with

Cross Attention, ICDAR 2019

39

SLIDE 40

CNN Attention

YC Wu, F Yin, XY Zhang, L Liu, CL Liu, SCAN: Sliding Convolutional Attention Network for Scene Text Recognition, arXiv 2018.

40
Convolutional

encoder

Convolutional

decoder

Fully parallel

training

Multi-scale sliding

windows

SLIDE 41

Self Attention

P Wang, L Yang , et al., Simple and Robust Convolutional- Attention Network for Irregular Text Recognition, arXiv 2019

Transformer based model
2D CNN image encoder
2D attention
Fully parallel training
Can deal with irregular text

recognition

SLIDE 42

2D CTC

42

Z. Wan, et.al, 2D-CTC for Scene Text Recognition, arXiv 2019

probability distribution map path transition map 速度快，能解决曲线文本识别问题

SLIDE 43



F Cong, W Hu, Q Huo, L Guo, A Comparative Study of Attention-Based Encoder- Decoder Approaches to Natural Scene Text Recognition, ICDAR 2019

43

Attention or CTC ?

H Ding, K Chen, et al., ICDAR 2017

SLIDE 44

 CTC:

 Can hardly be directly applied to 2D prediction  Large computation involved for long sequence  Performance degradation for repeat patterns

 Attention:

 Misalignment problem (attention drift)  More memory size required

44

Limitation of Attention and CTC

SLIDE 45

 Aggregation cross-entropy (ACE)

Aggregation of the probability for each class along the time dimension
Regarding the accumulative class predict and label annotation as probability

distributions over all the classes;

Comparison between these two probability distributions using cross-entropy

45

Alternative to CTC & Attention: ACE

Z Xie, et al., Aggregation Cross-Entropy for Sequence Recognition, CVPR 2019 (oral) Code: https://github.com/summerlvsong/Aggregation-Cross-Entropy

SLIDE 46

46

SLIDE 47

 Prevent training errors be accumulated  errors can accumulate in a cascade of detection + recognition which may lead to a large fraction of garbage predictions  Jointly optimization to help improve overall performance  Easier to maintain and adapt to new domain  maintaining a cascaded pipeline with data and model dependencies requires substantial engineering effort  Faster, Smaller, Stronger

47

Why End2End?

Xuebo Liu et al., FOTS: Fast Oriented Text Spotting with a Unified Network, CVPR 2018

SLIDE 48

Issues of end-to-end STR

Feature sharing between detector and recognizer
Connection between detector and recognition
Joint optimization
Reading order

– 从左到右；从上到下…

Significant differences in learning difficulties and

convergence rates.

– 训练样本需求不同、训练难度及时间不同

检测器：1000 – 10000 样本
识别器：10万 – 百万级 or 更多

48

SLIDE 49

Some new technique to bridge between detector and recognizer

RoI Rotate （多方向e2e ）

– X Liu, et al., FOTS, CVPR 2018

Text-alignment （多方向e2e）

– T. He, et al., TextSpotter, CVPR 2018

Tailored RoI pooling （保持长宽比重采样)

– H Li et al., Towards End-to-End Text Spotting in Natural Scenes, arXiv 20190617 （extension of “H Li et al., ICCV 2017”)

Perspective RoI transform

(任意形状e2e） – Y Sun, et al., TextNet, ACCV 2018

RoI Align + char segmentation + SAM (任意形状e2e）

– M Liao, P Lyu, et al, Mask TextSpotter, TPAMI 2019

RoI Masking (任意形状e2e）

– S Qin, A Bissacco, et al. (Google AI), Towards Unconstrained End-to- End Text Spotting, ICCV 2019

49

SLIDE 50

50

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

网络架构

通过对Mask RCNN 中Mask分支预测结果

的维度的扩展，实现端到端地检测文本并识别出文本的内容。

Character Segmentation 分支
TPAMI版本的扩展: 引入SAM模块

Mask 分支

Pengyuan Lyu*, Minghui Liao*, et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes, ECCV 2018. TPAMI 2019

SLIDE 51

 Mask R-CNN framework + Attention Decoder  RoI Masking  使用了百万级外部数据： OCR引擎输出作为标注（partially labelling)

 结果与别的方法不具有可比性

51

Towards Unconstrained End-to-End Text Spotting

S Qin, A Bissacco, et al. (Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019

SLIDE 52

52

Discussion & Future Prospect

SLIDE 53

 即使没有学习过，但大部分人仍然能准确

识别左图的文字

 药剂师大都可以轻松识别下图的药方  Generalization ability?  NLP + OCR ?

53

推广能力

01 01

SLIDE 54

 Zero-shot Learning. 猜猜刻的是什么字？

54

OCR & Knowledge & NLP

02 02

SLIDE 55

Evaluation Issue

Consistent of training and testing dataset
Consistent evaluation
Is end-to-end evaluation necessary to judge a detection method?

55

Accepted by ICCV 2019 (oral)

03 03

SLIDE 56

Evaluation metric issue

56

The weakness of current IoU metric
Y. Liu, L. Jin, et al., Tightness-aware

Evaluation Protocol for Scene Text Detection, CVPR 2019. Code:

https://github.com/Yuliang-Liu/TIoU-metric

Some recent related works

– CY Lee, Y Baek, H Lee, TedEval: A Fair Evaluation Metric for Scene Text Detectors, arXiv 201907 – HS Lee, Y Yoon, et al., PopEval- A Character-Level Approach to End-To-End Evaluation Compatible with Word-Level Benchmark Dataset, ICDAR 2019 Workshop

SLIDE 57

 Performance of deep learning based approaches are highly

depended on high quality data

 Do we have enough data?

ICDAR 2011/13/15, MSRA TD500, SVTP, CUTE80, CocoText, MLT

17/19, RCTW, SCUT-CTW1500, TotalText, MTWI, LSVT, ReCTS, ArT…  What’s next good and challenge dataset?  TextNet-1M? 1 million text images that contains:

Scene text in the wild
Document text (in the wild)
Handwritten and printed text
Multi-lingual (eg English + Chinese）
Diversity and wide and extensive representation
invoice, menu, poster, driver license card, product label, receipt, slide, street view…

57

Data Issue

04 04

SLIDE 58

 Synth90k (MJSynth)

M. Jaderberg et al. Synthetic data and artificial neural networks for natural scene

text recognition. NIPS2014  SynthText  A. Gupta et al. Synthetic data for text localisation in natural images. CVPR2016  Verisimilar  Zhan et al. Verisimilar Image Synthesis for Accurate Detection and Recognition

f Texts in Scenes. ECCV2018

 SF-GAN  F. Zhang, H. Zhu, S. Lu,Spatial Fusion GAN for Image Synthesis, CVPR 2019.  SynthText3D  M. Liao, et al., SynthText3D： Synthesizing Scene Text Images from 3D Virtual Worlds, arXiv 20190713

58

Data Synthesis

SLIDE 59

59

Data Synthesis

合成的数据看起来还是有点假？

SLIDE 60

60

Crucial role of real data

H. Li, P. Wang, C. Shen, G. Zhang, Show, Attend and read: A simple and strong baseline for

irregular text recognition, AAAI 2019

IIIT5K SVT IC13 IC15 SVTP CT80 COCO-T OnlySynth 91.5 84.5 91.0 69.2 76.4 83.3

Synth+Real

95.0 91.2 94.0 78.8 86.4 89.6 66.8

我们是否一定要用合成数据训练模型来评测各种算法？

SLIDE 61

 ICDAR 2019 LSVT

 部分标注数据 40万（海量！）  Train: 30000; Test: 20000

 ICDAR 2019 ArT

 曲线文本、不规则形状  Train: 5603; Test: 4563

 MLT 2019

 多语言：10种语言  Train: 10000; Test: 10000

 ICDAR 2019 ReCTS

 街景招牌; 60万字符级标注样本  Train: 20000; Test: 5000

 ICDAR 2019 ST-VQA

 Text VQA

61

New dataset, new challenge, new opportunity

SLIDE 62

 We have received an overwhelming number of submissions

 A total of 78 submissions from 46 unique teams/individuals were received

 The top performing scores of each task

 i) T1 : 82.65% (detection, H-mean)  ii) T2.1 : 74.30% (recognition, Latin script only, word accuracy)  iii) T2.2: 85.32% (recognition, Latin & Chinese, 1-NED)  iv) T3.1 : 53.86% (End-to-end text spotting , Latin script only, 1-NED)  v) T3.2 : 54.91% (End-to-end text spotting , Latin & Chinese, 1-NED)

 End-to-end text spotting seems to be the most challenging task (1-

NED<55%)

 There are still much room and research opportunity for further

improvement

 esp. recognition; end-to-end text spotting

62

ICDAR 2019 ArT: new opportunity

SLIDE 63

63

Gap between research-end and industry-end (or application-end)

工业界的实际场景学术界定义的场景

05 05

SLIDE 64

64

学术界定义的场景与工业界是否一致？

工业界的实际场景学术界定义的场景

SLIDE 65

 Chargrid Model for Document Understanding

65

New topic: Document Structure

 Information extraction of Invoice Document

AR Katti, et al, Chargrid: Towards Understanding 2D Documents, EMNLP 2018.

06 06

SLIDE 66

66

文档结构化（OCR + GNN + NLP ）

 Document structure & entity extraction

 eg. “Key-Value” issue

X Liu, F Gao, et al, Graph Convolution for Multimodal Information Extraction…, NAACL 2019.

SLIDE 67

Aniruddha Kembhavi, et al., Are you smarter than a sixth grader? textbook question

answering…, CVPR 2018.

A Singh, V Natarajan, et.al, Toward VQA models that can read, CVPR 2019
Scene Text Visual Question Answering , ICCV 2019
Anand Mishra, et al., OCR-VQA: Visual Question Answering by Reading Text in Images,

ICDAR 2019

A Biten, et al., Scene Text Visual Question Answering, arXiv 2019.
https://rrc.cvc.uab.es/?ch=11: ICDAR 2019 Robust Reading Challenge on Scene Text Visual

Question Answering

New topic: Text VQA

07 07

SLIDE 68

New topic: End2end Scene Text Removal

68

ST Zhang, YL Liu, LW Jin, et al, EnsNet: Ensconce Text in the Wild, AAAI 2019.

Dataset: https://github.com/HCIILAB/Scene-Text-Removal Code: coming soon…

T Nakamura, et al., Scene Text Eraser, ICDAR 2017.
O Tursun, et al., MTRNet: A Generic Scene Text Eraser, arXiv 2019
L Wu, C Zhang et al, Editing Text in the Wild, ACM MM 2019.
P Roy, STEFANN: Scene Text Editor using Font Adaptive Neural Network, arXiv 2019

08 08

SLIDE 69

Edit Text in the Wild

69

L Wu, C Zhang et al, Editing Text in the Wild, ACM MM 2019.

SRNet: Style retention network

08 08

SLIDE 70

70

Handwritings in the Wild ?

09 09

SLIDE 71

 Semi- or weakly- supervised learning

 X Qin, Y Zhou, et al., Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning, ICDAR 2019  YP Sun, et al., Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning, ICCV 2019.  S Qin, A. Bissacco, Towards Unconstrained End-to-End Text Spotting, ICCV 2019

利用通用OCR结果来进行弱监督学习以提升性能

 Z Xie, et al., Weakly supervised precise segmentation for historical document images, Neurocomputing, 2019.

弱监督训练字符分类器来提升高IoU下的文字检测性能

 Adversarial Learning, eg.：

 AK Bhunia, et al., Handwriting Recognition in Low-resource Scripts using Adversarial Learning, CVPR 2019

利用对抗学习来进行样本增广（在特征空间进行对抗增广）

71

Other Trends

10 10

SLIDE 72

 Robustness

 Adaptation, worst-case robustness, Self exploration, Withstand adversarial attack, …

 Specification

 Bias of data, alignment between model and human preferences…

 Assurance

 Monitioring/controlling of system, Interpretability

know when it knows or does not know （eg. confidence measurement)

 Confidence / out-of-distribution detection (OOD) issues

HM Yang, XY Xhang, F Yin, CL Liu, Robust Classification with Convolutional Prototype

Learning, CVPR 2018

M Hein, et al., Why ReLU Networks Yield High-Confidence Predictions Far Away From

the Training Data and How to Mitigate the problem, CVPR 2019.

S Vermekar, et al., Analysis of Confident-Classifiers for Out-of-distribution Detection,

SafeML ICLR 2019 Workshop.

A. Meinke, M. Hein, Towards neural networks that provably know when they don‘t

know, arXiv 201909. (知道自己不知道…）

 SafeML ICLR 2019 Workshop, https://sites.google.com/view/safeml-iclr2019

72

Safe Machine Learning

10 10

SLIDE 73

谢谢

金连文 (JIN Lianwen)

eelwjin@scut.edu.cn lianwen.jin@gmail.com

73

浅谈文字识别：新观察、 新思考、新机遇

金连文

华南理工大学

 Discussion and Future Prospects

Outline

 Introduction  Recent Progress and Trends

文字是信息交流及感知世界最重要的载体

离开文字，我们很难理解世界

离开文字，有时候我们很难理解图像

文字使我们的生活变得丰富多彩…

文字的重要性

DAR、 OCR、 STR

场景文字检测与识别 (STR)

Challenge of Scene Text Detection

Scene Text Detection

Direct Regression Net

TextField

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

LOMO： Look more than once

PAN

其它一些细节方面的发展趋势（1）

其它一些发展趋势（2）

文字几何属性的应用

（3）Anchor & RPN 调参问题

DeRPN

Experiments

Experiments

Summary of DeRPN

（4）标注歧义性问题

Sensitive to Label Sequence (SLS) issue

Our Approach: Sequential-invariant Box Discretization (SBD)

Box Discretization Network for Omni-directional Object detection

Experimental Results

Generalization Ability of BDN

Scene Text Recognition

:

:

Challenge of Scene Text Recognition

解决好上述某些Issues带来的成果举例

ASTER

ASTER的后续改进举例：ESIR，ScRN

MORAN: 像素级矫正方法

2D Attention、Cascade Attention

CNN Attention

Self Attention

2D CTC

Attention or CTC ?

Limitation of Attention and CTC

Alternative to CTC & Attention: ACE

Why End2End?

Issues of end-to-end STR

convergence rates.

Some new technique to bridge between detector and recognizer

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

Towards Unconstrained End-to-End Text Spotting

Discussion & Future Prospect

推广能力

01 01

OCR & Knowledge & NLP

02 02

Evaluation Issue

03 03

Evaluation metric issue

depended on high quality data

Data Issue

04 04

Data Synthesis

Data Synthesis

Crucial role of real data

我们是否一定要用合成数据训练模型来评测各种算法？

New dataset, new challenge, new opportunity

ICDAR 2019 ArT: new opportunity

Gap between research-end and industry-end (or application-end)

05 05

学术界定义的场景与工业界是否一致？

New topic: Document Structure

06 06

文档结构化（OCR + GNN + NLP ）

New topic: Text VQA

07 07

浅谈文字识别：新观察、新思考、新机遇