2019 10 16 Outline - - PowerPoint PPT Presentation

2019 10 16 outline introduction
SMART_READER_LITE
LIVE PREVIEW

2019 10 16 Outline - - PowerPoint PPT Presentation

2019 10 16 Outline Introduction Recent Progress and Trends Scene Text Detection Scene Text Recognition End2End Scene Text


slide-1
SLIDE 1

浅谈文字识别:新观察、 新思考、新机遇

金连文

华南理工大学

2019年10月16日

slide-2
SLIDE 2

 Discussion and Future Prospects

Outline

 Introduction  Recent Progress and Trends

  • Scene Text Recognition
  • End2End Scene Text Recognition
  • Scene Text Detection
slide-3
SLIDE 3

文字是信息交流及感知世界最重要的载体

3

生活中文字图像无处不在

slide-4
SLIDE 4

离开文字,我们很难理解世界

4

slide-5
SLIDE 5

离开文字,有时候我们很难理解图像

5

slide-6
SLIDE 6

文字使我们的生活变得丰富多彩…

6

slide-7
SLIDE 7

文字的重要性

给一张图, 如果上面有文字, 绝大多数情况下, 图中的文字最有信息量!

7

slide-8
SLIDE 8

DAR、 OCR、 STR

  • 文档图像分析与识别 (Document Analysis & Recognition, DAR)
  • 光学字符识别(Optical Character Recognition, OCR)

– 场景文字检测与识别(Scene Text Recognition, STR)

  • 在线文字识别(Online Handwritten Character Recognition, Online HCR)

8

在线文字识别(Online HCR) 联 机 手 写 文 字 识 别 在 线 签 名 及 笔 迹 识 别 联 机 数 学 公 式 识 别 … 光学字符识别(OCR) 手 写 体 文 字 识 别 印 刷 体 文 字 识 别 字符 文本行 篇幅 复杂 版面 … 报刊书籍 扫描文档 证照车牌 表单名片 … … 场景文字

slide-9
SLIDE 9

 文字检测  文字识别  字符/词/文本行  端到端(End-to-End)识别

9

场景文字检测与识别 (STR)

slide-10
SLIDE 10

10

slide-11
SLIDE 11

1.

Arbitrarily oriented

2.

Irregular text, perspective distortion

3.

Scale diversity

4.

Ambiguity of annotation  Char, Word,Text, Label sequence order

5.

Completeness and tightness  IoU>=0.5 ?

6.

Arbitrary variation of text appearances

7.

Different types of imaging artifacts

8.

Complicated image background

9.

Uneven lighting

  • 10. Low resolution
  • 11. Heavy overlay
  • 12. Long text detection

11

Challenge of Scene Text Detection

slide-12
SLIDE 12

12

Scene Text Detection

 场景文字检测方法举例:

:

 基于回归的方法

  • Gupta et al., CVPR 2016; Tian et al., ECCV 2016
  • Shi, Bai et al., CVPR 2017, Liu et al, CVPR 2017
  • Liao et al., AAAI 2017, Hu et al, ICCV 2017 …

 基于分割的方法

  • Zhong et al., CVPR 2016; Zhou et al, CVPR 2017;
  • Wu et al, ICCV 2017; Deng et al, AAAI 2018;
  • X Li, CVPR 2019; W Wang, et al., CVPR 2019

 混合方法(分割+回归)

  • He et al, ICCV 2017; Lyu et al, CVPR 2018
  • Liao et al, CVPR 2018; Long et al, ECCV 2018; …
  • Liu et al., IJCAI 2019; …

 发展趋势:

:

水平矩形框检测 多方向矩形框 多方向四边形 曲线文本 任意形状  Segmentation based的方法不容易准确区分相邻或重叠文本  Regression based 的方法对长文本不易检测完整

  • Bounding box regression 方法需要设置合理的anchor参数
slide-13
SLIDE 13

Direct Regression Net

13

  • C. He, et.al, Multi-Oriented and Multi-Lingual Scene Text Detection With Direct Regression, TIP 2018.
slide-14
SLIDE 14

TextField

14

YC Xu, et.al, TextField- Learning A Deep Direction Field for Irregular Scene Text Detection, IEEE TIP 2019.

  • Text Directional Field

– a two-dimensional unit vector that points away from its nearest text boundary pixel

  • Instance-level representation
  • Easy to separate adjacent text instances
  • Post processing is applied to produce the final detection result
slide-15
SLIDE 15

Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation

15

  • X. Wang, et.al, Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation,

CVPR 2019 (Oral).

  • Adaptive representation of text region
  • CNN + LSTM
slide-16
SLIDE 16

LOMO: Look more than once

16

  • C. Zhang, et.al, Look More Than Once- An Accurate Detector for Text of Arbitrary Shape, CVPR 2019.
  • 解决长文本检测问题、曲线文本检测问题
  • 设计了IRM(解决长文本检测)及SEM(解决

曲线文本检测)等新模块

  • 无需复杂的后处理,端到端可训练
slide-17
SLIDE 17

PAN

17

  • PSENet (CVPR 2019) 团队新作
  • 速度快,性能好
  • 解决密集长文本检测、任意曲线文本检测
  • Semantic Segmentation (Text region, Kernel,

Similar vectors)

  • Text Instance Rebuilding wit Learnable Pixel

Aggregation method W Wang, E Xie, et al., Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network, ICCV 2019

slide-18
SLIDE 18

 多尺度问题(large scale variations )

  • “The scale ratio between the largest and the smallest texts is up to 230 times for

images in the MSRA-TD500” -- C Xue, et al., IJCAI 2019

  • E Richardson, et al., It’s All About The Scale! arXiv 2019.
  • C Xue, et al., MSR- Multi-Scale Shape Regression for STD, IJCAI 2019
  • YX Wang, et al., DSRN: A deep scale relationship network for …, IJCAI 2019.
  • W Wang, et al., PSENet, CVPR 2019
  • W He, Realtime multi-scale scene text detection…, PR 2019.

18

其它一些细节方面的发展趋势(1)

slide-19
SLIDE 19

 Learning geometric properties of text/char/pixel regions, eg:

 Char/text center line ;  Char/text border offset  Char/text center offset;  Char/text vertex offset  Character affinity,  Corner point  Visual relationship …

  • eg ECCV 2018(TextSnake),CVPR 2019 (LOMO, CRAFT),
  • IJCAI 2019 (MSR), TIP 2018 (TextField),
  • ACM MM2019 (SAST), ICCV 2019 (ScRN)
  • C Ma, Z Zhong, ICDAR 2019

19

其它一些发展趋势(2)

slide-20
SLIDE 20

文字几何属性的应用

M Yang, ScRN, ICCV 2019 (识别)

  • Y. Baek, et.al, CRAFT, CVPR 2019

S Long,et al.,TextSnake, ECCV 2018 P Wang, et.al, SAST, MM 2019

slide-21
SLIDE 21

 Anchor free 回归方法举例:

  • Segmentation based methods
  • C. He, et.al, Direct Regression…, ICCV 2017, TIP 2018.
  • Z Zhong et al., An Anchor-Free Region Proposal Network …, IJDAR 2019.
  • Zhi Tian, Chunhua Shen, et. al. FCOS, CVPR, 2019
  • Chenchen Zhu, Yihui He, et. al. FSAF, CVPR, 2019
  • Tao Kong, Fuchun Sun et. al. FoveaBox, arXiv 2019

 Why anchor free?  大多数RPN regression 方法需要设置合理的anchors参数

  • Eg: SSD  TextBox (AAAI 2017)

 Alternative anchor design?

  • Lele Xie, Yuliang Liu, Lianwen Jin, Zecheng Xie, DeRPN: Taking a further

step toward more general object detection, AAAI 2019.

21

(3)Anchor & RPN 调参问题

slide-22
SLIDE 22

DeRPN

  • Dimension-decomposition region proposal network (DeRPN).
  • DeRPN utilizes an anchor string mechanism. Match the objects with

anchor strings based on length instead of IoU.

  • DeRPN can be employed directly on different models, tasks, and datasets

without any modifications of hyperparameters or specialized optimization.

Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019.

slide-23
SLIDE 23

Experiments

– Experimental Data

  • PASCAL VOC 2007
  • PASCAL VOC 2012
  • MS COCO
  • ICDAR 2013
  • MS COCO Text

– To verify the robustness & adaptivity, we maintained the same hyperparameters for DeRPN throughout all of our experiments without any modifications

  • 同一个模型胜任不同任务不同数据集,不用任何调参
slide-24
SLIDE 24

Experiments

24

slide-25
SLIDE 25

Summary of DeRPN

  • Good adaptivity and generalization ability
  • Able to detect objects of variant size

– range of [𝟗 𝒓, 𝟐𝟏𝟑𝟓 𝒓]

  • Regression loss of DeRPN is bounded

– The largest deviation (ratio) between the anchor string and object edge is at most

𝒓

  • Better performance

– Higher recall rate – Tighter bounding box

  • (better performance for high IoU)
  • Limitation

– Can only deal with rectangle Bbox – For two-stage framework only

25

Lele Xie, et al., DeRPN: Taking a further step toward more general object detection, AAAI 2019. Code: https://github.com/HCIILAB/DeRPN

slide-26
SLIDE 26

(4)标注歧义性问题

26

Char, Word or Line Label sequence order

slide-27
SLIDE 27

Sensitive to Label Sequence (SLS) issue

  • Existed methods addressed but did

not solved this problem completely.

Solution A (CVPR17) Solution B (TIP18) Solution C (ACM MM18)

https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA

slide-28
SLIDE 28

Our Approach: Sequential-invariant Box Discretization (SBD)

28

https://mp.weixin.qq.com/s/pxLR0R7tT7Rbhu-NFfv_aA

Yuliang Liu, el.al., IJCAI 2019.

slide-29
SLIDE 29

Box Discretization Network for Omni-directional Object detection

29

  • Y. Liu, S. Zhang, L. Jin, el.al., Omnidirectional Scene Text Detection with Sequential-free Box

Discretization, IJCAI 2019.

slide-30
SLIDE 30

Experimental Results

30

  • Our results were produced through single scale testing
slide-31
SLIDE 31

Generalization Ability of BDN

  • Ship detection using the same model without modification of any

hyperparameter (不用调参,训练3小时左右,达到STOA) 以此模型为基础,获得 ICDAR 2019 ReCTS 检测任务冠军

Yuliang Liu, et al., Omnidirectional Scene Text Detection with Sequential-free Box Discretization, IJCAI

  • 2019. Code: https://github.com/Yuliang-Liu/Box_Discretization_Network
slide-32
SLIDE 32

32

slide-33
SLIDE 33

33

Scene Text Recognition

 场景文字识别方法举例:

:

 基于CTC CTC的方法

  • P. He et al., AAAI 2016 (DTRN: CNN+RNN+CTC)
  • B. Shi et al. , TPAMI 2017 (CRNN: CNN+RNN+CTC)
  • F Yin, et al., arXiv 2017 (CNN+CTC)
  • Y Wu, et al., arXiv 218 (CNN+CTC)
  • Y. Liu et al., ECCV 2018 (GAN+CTC)

 基于Attention的方法

  • C Lee et al., CVPR 2016; B. Shi et al., CVPR 2016
  • X. Yang et al. IJCAI 2017
  • Bai et al. , CVPR 2018; Liu et al., AAI 2018
  • Shi et al., TPAMI 2018 (ASTER);
  • Luo et al, PR 2019 (MORAN);
  • Li et.al, AAAI 2019…

规则文本 不规则文本识别 CTC Attention (1D, 2D) 检测+识别 检测识别端到端

 近期发展趋势:

:

slide-34
SLIDE 34

1.

Perspective distortion, irregular, arbitrarily oriented

2.

Text line curvature

3.

Arbitrary variation of text appearances / styles

 eg Calligraphic fonts  Hybrid horizonal/vertical texts

4.

Different types of imaging artifacts

5.

Uneven lighting

6.

Complicated image background (eg. 多余背景问题)

7.

Low resolution

8.

Scale diversity

9.

Heavy occlusions

  • 10. Similar and confusable characters

34

Challenge of Scene Text Recognition

slide-35
SLIDE 35

 不规则文字识别的解决思路举例:

 基于矫正的方法:eg: RARE、ASTER 、MORAN、ESIR

  • X. Yang, D. He, et al. Robust scene text recognition with automatic rectification, CVPR 2016
  • Shi B, Yang M, Wang X, et al. ASTER: An attentional scene text recognizer with flexible rectification,

IEEE TPAMI 2018.已开源

  • C. Luo, L. Jin, et.al, “MORAN: A multi-object rectified attention network for scene text recognition,” Pattern

Recognition,2019.已开源

  • Fangnang Zhan, Shijian Lu, ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification,

CVPR 2019

 基于二维Attention的方法, eg:

  • Yang et al. “Learning to Read Irregular Text with Attention Mechanisms.” IJCAI. 2017
  • Li et al. Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition, AAAI.2019
  • M. Liao, et.al, Scene Text Recognition from Two-Dimensional Perspective. AAAI 2019
  • P Wang, et.al., A Simple and Robust Convolutional-Attention Network for Irregular Text Recognition, ICCV
  • 2019. (2D CNN Attention)

 基于字符级识别解决, eg: Char-Net, Mask TextSpotter

  • W Liu, etal, Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition, AAAI 2018
  • M. Liao et al., Mask TextSpotter- An End-to-End Trainable Neural Network for Spotting Text with Arbitrary

Shapes, ECCV 2018, IEEE TPAMI 2019 已开源 35

解决好上述某些Issues带来的成果举例

slide-36
SLIDE 36

ASTER

36

B Shi, et.al., ASTER- An Attentional Scene Text Recognizer with Flexible Rectification , IEEE TPAMI 2018.

TPS+CNN+BLSTM+Attention

slide-37
SLIDE 37

ASTER的后续改进举例:ESIR,ScRN

37 M Yang, et al., Symmetry-constrained Rectification Network for Scene Text Recognition , ICCV 2019 F Zhan, S Lu, et al., ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification, CVPR 2019

slide-38
SLIDE 38

MORAN: 像素级矫正方法

38

  • C. Luo, L. Jin, et.al., MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition,

Pattern Recognition, 2019. Code: https://github.com/Canjie-Luo/MORAN_v2

slide-39
SLIDE 39

2D Attention、Cascade Attention

  • X. Yang, D. He, Z. Zhou, D. Kifer, and C. L. Giles. Learning to read irregular

text with attention mechanisms. IJCAI, 2017.

  • Liu et al. Char-net: A character-aware neural network for distorted scene text
  • recognition. AAAI. 2018.
  • H. Li, P. Wang, C. Shen, and G. Zhang, Show, attend and read: A simple and

strong baseline for irregular text recognition, AAAI 2019

  • M. Liao, et al., Scene Text Recognition from Two-Dimensional Perspective,

AAAI 2019.

  • Y Huang, et al., Attention after attention: Reading Text in the Wild with

Cross Attention, ICDAR 2019

39

slide-40
SLIDE 40

CNN Attention

YC Wu, F Yin, XY Zhang, L Liu, CL Liu, SCAN: Sliding Convolutional Attention Network for Scene Text Recognition, arXiv 2018.

  • 40
  • Convolutional

encoder

  • Convolutional

decoder

  • Fully parallel

training

  • Multi-scale sliding

windows

slide-41
SLIDE 41

Self Attention

P Wang, L Yang , et al., Simple and Robust Convolutional- Attention Network for Irregular Text Recognition, arXiv 2019

  • Transformer based model
  • 2D CNN image encoder
  • 2D attention
  • Fully parallel training
  • Can deal with irregular text

recognition

slide-42
SLIDE 42

2D CTC

42

  • Z. Wan, et.al, 2D-CTC for Scene Text Recognition, arXiv 2019

probability distribution map path transition map 速度快,能解决曲线文本识别问题

slide-43
SLIDE 43

F Cong, W Hu, Q Huo, L Guo, A Comparative Study of Attention-Based Encoder- Decoder Approaches to Natural Scene Text Recognition, ICDAR 2019

43

Attention or CTC ?

H Ding, K Chen, et al., ICDAR 2017

slide-44
SLIDE 44

 CTC:

 Can hardly be directly applied to 2D prediction  Large computation involved for long sequence  Performance degradation for repeat patterns

 Attention:

 Misalignment problem (attention drift)  More memory size required

44

Limitation of Attention and CTC

slide-45
SLIDE 45

 Aggregation cross-entropy (ACE)

  • Aggregation of the probability for each class along the time dimension
  • Regarding the accumulative class predict and label annotation as probability

distributions over all the classes;

  • Comparison between these two probability distributions using cross-entropy

45

Alternative to CTC & Attention: ACE

Z Xie, et al., Aggregation Cross-Entropy for Sequence Recognition, CVPR 2019 (oral) Code: https://github.com/summerlvsong/Aggregation-Cross-Entropy

slide-46
SLIDE 46

46

slide-47
SLIDE 47

 Prevent training errors be accumulated  errors can accumulate in a cascade of detection + recognition which may lead to a large fraction of garbage predictions  Jointly optimization to help improve overall performance  Easier to maintain and adapt to new domain  maintaining a cascaded pipeline with data and model dependencies requires substantial engineering effort  Faster, Smaller, Stronger

47

Why End2End?

Xuebo Liu et al., FOTS: Fast Oriented Text Spotting with a Unified Network, CVPR 2018

slide-48
SLIDE 48

Issues of end-to-end STR

  • Feature sharing between detector and recognizer
  • Connection between detector and recognition
  • Joint optimization
  • Reading order

– 从左到右; 从上到下…

  • Significant differences in learning difficulties and

convergence rates.

– 训练样本需求不同、训练难度及时间不同

  • 检测器:1000 – 10000 样本
  • 识别器:10万 – 百万级 or 更多

48

slide-49
SLIDE 49

Some new technique to bridge between detector and recognizer

  • RoI Rotate (多方向e2e )

– X Liu, et al., FOTS, CVPR 2018

  • Text-alignment (多方向e2e)

– T. He, et al., TextSpotter, CVPR 2018

  • Tailored RoI pooling (保持长宽比重采样)

– H Li et al., Towards End-to-End Text Spotting in Natural Scenes, arXiv 20190617 (extension of “H Li et al., ICCV 2017”)

  • Perspective RoI transform

(任意形状e2e) – Y Sun, et al., TextNet, ACCV 2018

  • RoI Align + char segmentation + SAM (任意形状e2e)

– M Liao, P Lyu, et al, Mask TextSpotter, TPAMI 2019

  • RoI Masking (任意形状e2e)

– S Qin, A Bissacco, et al. (Google AI), Towards Unconstrained End-to- End Text Spotting, ICCV 2019

49

slide-50
SLIDE 50

50

Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes

网络架构

  • 通过对Mask RCNN 中Mask分支预测结果

的维度的扩展,实现端到端地检测文本并 识别出文本的内容。

  • Character Segmentation 分支
  • TPAMI版本的扩展: 引入SAM模块

Mask 分支

Pengyuan Lyu*, Minghui Liao*, et al. Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes, ECCV 2018. TPAMI 2019

slide-51
SLIDE 51

 Mask R-CNN framework + Attention Decoder  RoI Masking  使用了百万级外部数据: OCR引擎输出作为标注(partially labelling)

 结果与别的方法不具有可比性

51

Towards Unconstrained End-to-End Text Spotting

S Qin, A Bissacco, et al. (Google AI), Towards Unconstrained End-to-End Text Spotting, ICCV 2019

slide-52
SLIDE 52

52

Discussion & Future Prospect

slide-53
SLIDE 53

 即使没有学习过,但大部分人仍然能准确

识别左图的文字

 药剂师大都可以轻松识别下图的药方  Generalization ability?  NLP + OCR ?

53

推广能力

01 01

slide-54
SLIDE 54

 Zero-shot Learning. 猜猜刻的是什么字?

54

OCR & Knowledge & NLP

02 02

slide-55
SLIDE 55

Evaluation Issue

  • Consistent of training and testing dataset
  • Consistent evaluation
  • Is end-to-end evaluation necessary to judge a detection method?

55

Accepted by ICCV 2019 (oral)

03 03

slide-56
SLIDE 56

Evaluation metric issue

56

  • The weakness of current IoU metric
  • Y. Liu, L. Jin, et al., Tightness-aware

Evaluation Protocol for Scene Text Detection, CVPR 2019. Code:

https://github.com/Yuliang-Liu/TIoU-metric

  • Some recent related works

– CY Lee, Y Baek, H Lee, TedEval: A Fair Evaluation Metric for Scene Text Detectors, arXiv 201907 – HS Lee, Y Yoon, et al., PopEval- A Character-Level Approach to End-To-End Evaluation Compatible with Word-Level Benchmark Dataset, ICDAR 2019 Workshop

slide-57
SLIDE 57

 Performance of deep learning based approaches are highly

depended on high quality data

 Do we have enough data?

  • ICDAR 2011/13/15, MSRA TD500, SVTP, CUTE80, CocoText, MLT

17/19, RCTW, SCUT-CTW1500, TotalText, MTWI, LSVT, ReCTS, ArT…  What’s next good and challenge dataset?  TextNet-1M? 1 million text images that contains:

  • Scene text in the wild
  • Document text (in the wild)
  • Handwritten and printed text
  • Multi-lingual (eg English + Chinese)
  • Diversity and wide and extensive representation
  • invoice, menu, poster, driver license card, product label, receipt, slide, street view…

57

Data Issue

04 04

slide-58
SLIDE 58

 Synth90k (MJSynth)

  • M. Jaderberg et al. Synthetic data and artificial neural networks for natural scene

text recognition. NIPS2014  SynthText  A. Gupta et al. Synthetic data for text localisation in natural images. CVPR2016  Verisimilar  Zhan et al. Verisimilar Image Synthesis for Accurate Detection and Recognition

  • f Texts in Scenes. ECCV2018

 SF-GAN  F. Zhang, H. Zhu, S. Lu,Spatial Fusion GAN for Image Synthesis, CVPR 2019.  SynthText3D  M. Liao, et al., SynthText3D: Synthesizing Scene Text Images from 3D Virtual Worlds, arXiv 20190713

58

Data Synthesis

slide-59
SLIDE 59

59

Data Synthesis

合成的数据看起来还是有点假?

slide-60
SLIDE 60

60

Crucial role of real data

  • H. Li, P. Wang, C. Shen, G. Zhang, Show, Attend and read: A simple and strong baseline for

irregular text recognition, AAAI 2019

IIIT5K SVT IC13 IC15 SVTP CT80 COCO-T OnlySynth 91.5 84.5 91.0 69.2 76.4 83.3

  • Synth+Real

95.0 91.2 94.0 78.8 86.4 89.6 66.8

我们是否一定要用合成数据训练模型来评测各种算法?

slide-61
SLIDE 61

 ICDAR 2019 LSVT

 部分标注数据 40万 (海量!)  Train: 30000; Test: 20000

 ICDAR 2019 ArT

 曲线文本、不规则形状  Train: 5603; Test: 4563

 MLT 2019

 多语言:10种语言  Train: 10000; Test: 10000

 ICDAR 2019 ReCTS

 街景招牌; 60万字符级标注样本  Train: 20000; Test: 5000

 ICDAR 2019 ST-VQA

 Text VQA

61

New dataset, new challenge, new opportunity

slide-62
SLIDE 62

 We have received an overwhelming number of submissions

 A total of 78 submissions from 46 unique teams/individuals were received

 The top performing scores of each task

 i) T1 : 82.65% (detection, H-mean)  ii) T2.1 : 74.30% (recognition, Latin script only, word accuracy)  iii) T2.2: 85.32% (recognition, Latin & Chinese, 1-NED)  iv) T3.1 : 53.86% (End-to-end text spotting , Latin script only, 1-NED)  v) T3.2 : 54.91% (End-to-end text spotting , Latin & Chinese, 1-NED)

 End-to-end text spotting seems to be the most challenging task (1-

NED<55%)

 There are still much room and research opportunity for further

improvement

 esp. recognition; end-to-end text spotting

62

ICDAR 2019 ArT: new opportunity

slide-63
SLIDE 63

63

Gap between research-end and industry-end (or application-end)

工业界的实际场景 学术界定义的场景

05 05

slide-64
SLIDE 64

64

学术界定义的场景与工业界是否一致?

工业界的实际场景 学术界定义的场景

slide-65
SLIDE 65

 Chargrid Model for Document Understanding

65

New topic: Document Structure

 Information extraction of Invoice Document

AR Katti, et al, Chargrid: Towards Understanding 2D Documents, EMNLP 2018.

06 06

slide-66
SLIDE 66

66

文档结构化(OCR + GNN + NLP )

 Document structure & entity extraction

 eg. “Key-Value” issue

X Liu, F Gao, et al, Graph Convolution for Multimodal Information Extraction…, NAACL 2019.

slide-67
SLIDE 67
  • Aniruddha Kembhavi, et al., Are you smarter than a sixth grader? textbook question

answering…, CVPR 2018.

  • A Singh, V Natarajan, et.al, Toward VQA models that can read, CVPR 2019
  • Scene Text Visual Question Answering , ICCV 2019
  • Anand Mishra, et al., OCR-VQA: Visual Question Answering by Reading Text in Images,

ICDAR 2019

  • A Biten, et al., Scene Text Visual Question Answering, arXiv 2019.
  • https://rrc.cvc.uab.es/?ch=11: ICDAR 2019 Robust Reading Challenge on Scene Text Visual

Question Answering

New topic: Text VQA

07 07

slide-68
SLIDE 68

New topic: End2end Scene Text Removal

68

  • ST Zhang, YL Liu, LW Jin, et al, EnsNet: Ensconce Text in the Wild, AAAI 2019.

Dataset: https://github.com/HCIILAB/Scene-Text-Removal Code: coming soon…

  • T Nakamura, et al., Scene Text Eraser, ICDAR 2017.
  • O Tursun, et al., MTRNet: A Generic Scene Text Eraser, arXiv 2019
  • L Wu, C Zhang et al, Editing Text in the Wild, ACM MM 2019.
  • P Roy, STEFANN: Scene Text Editor using Font Adaptive Neural Network, arXiv 2019

08 08

slide-69
SLIDE 69

Edit Text in the Wild

69

L Wu, C Zhang et al, Editing Text in the Wild, ACM MM 2019.

SRNet: Style retention network

08 08

slide-70
SLIDE 70

70

Handwritings in the Wild ?

09 09

slide-71
SLIDE 71

 Semi- or weakly- supervised learning

 X Qin, Y Zhou, et al., Curved Text Detection in Natural Scene Images with Semi- and Weakly-Supervised Learning, ICDAR 2019  YP Sun, et al., Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning, ICCV 2019.  S Qin, A. Bissacco, Towards Unconstrained End-to-End Text Spotting, ICCV 2019

  • 利用通用OCR结果来进行弱监督学习以提升性能

 Z Xie, et al., Weakly supervised precise segmentation for historical document images, Neurocomputing, 2019.

  • 弱监督训练字符分类器来提升高IoU下的文字检测性能

 Adversarial Learning, eg.:

 AK Bhunia, et al., Handwriting Recognition in Low-resource Scripts using Adversarial Learning, CVPR 2019

  • 利用对抗学习来进行样本增广(在特征空间进行对抗增广)

71

Other Trends

10 10

slide-72
SLIDE 72

 Robustness

 Adaptation, worst-case robustness, Self exploration, Withstand adversarial attack, …

 Specification

 Bias of data, alignment between model and human preferences…

 Assurance

 Monitioring/controlling of system, Interpretability

  • know when it knows or does not know (eg. confidence measurement)

 Confidence / out-of-distribution detection (OOD) issues

  • HM Yang, XY Xhang, F Yin, CL Liu, Robust Classification with Convolutional Prototype

Learning, CVPR 2018

  • M Hein, et al., Why ReLU Networks Yield High-Confidence Predictions Far Away From

the Training Data and How to Mitigate the problem, CVPR 2019.

  • S Vermekar, et al., Analysis of Confident-Classifiers for Out-of-distribution Detection,

SafeML ICLR 2019 Workshop.

  • A. Meinke, M. Hein, Towards neural networks that provably know when they don‘t

know, arXiv 201909. (知道自己不知道…)

 SafeML ICLR 2019 Workshop, https://sites.google.com/view/safeml-iclr2019

72

Safe Machine Learning

10 10

slide-73
SLIDE 73

谢谢

金连文 (JIN Lianwen)

eelwjin@scut.edu.cn lianwen.jin@gmail.com

73