Representation in Scene Text Detection and Recognition Prof. Xiang - - PowerPoint PPT Presentation

representation in scene text detection and recognition
SMART_READER_LITE
LIVE PREVIEW

Representation in Scene Text Detection and Recognition Prof. Xiang - - PowerPoint PPT Presentation

Representation in Scene Text Detection and Recognition Prof. Xiang Bai Huazhong University of Science and Technology Contents Problem definition Significance and challenges Previous works Our algorithms Conclusion 2


slide-1
SLIDE 1

Representation in Scene Text Detection and Recognition

  • Prof. Xiang Bai

Huazhong University of Science and Technology

slide-2
SLIDE 2

Contents

  • Problem definition
  • Significance and challenges
  • Previous works
  • Our algorithms
  • Conclusion

2

slide-3
SLIDE 3

Contents

  • Problem definition
  • Significance and challenges
  • Previous works
  • Our algorithms
  • Conclusion

3

slide-4
SLIDE 4

Problem definition

4

Scene text detection: the process of predicting the presence of text and localizing each instance (if any), usually at word or line level, in natural scenes

slide-5
SLIDE 5

Problem definition

5

Scene text recognition: the process of converting text regions into computer readable and editable symbols

Tango ATM Hotel BLACK

slide-6
SLIDE 6

Contents

  • Problem definition
  • Significance and challenges
  • Previous works
  • Our algorithms
  • Conclusion

6

slide-7
SLIDE 7

Significance

7

  • text in natural scenes carries rich and precise high level semantics
  • text information can be useful to a variety of applications:

scene understanding, product search, HCI, virtual reality…

slide-8
SLIDE 8

challenges

8

Diversity of scene text: different colors, scales, orientations, fonts, languages…

slide-9
SLIDE 9

challenges

9

Complexity of background: elements like signs, fences, bricks, and grasses are virtually undistinguishable from true text

slide-10
SLIDE 10

challenges

10

Various interference factors: noise, blur, non-uniform illumination, low resolution, partial occlusion…

slide-11
SLIDE 11

challenges

These challenges make scene text detection and recognition extremely difficult problems

11

slide-12
SLIDE 12

Contents

  • Problem definition
  • Significance and challenges
  • Previous works
  • Our algorithms
  • Conclusion

12

slide-13
SLIDE 13

Previous works

Three categories:

  • 1. text detection
  • nly localize text regions, no need to recognize the

content

  • 2. text recognition
  • nly recognize the content, assume text regions are

given

  • 3. end-to-end text recognition

perform both text detection and recognition

13

slide-14
SLIDE 14

Previous works

In the following slides, we will review a number of previous algorithms, mainly from the perspective of representation

14

slide-15
SLIDE 15

Text Detection

15

  • extract character candidates using Maximally Stable Extremal

Regions, assuming similar color within each character

  • robust, fast to compute, independent of scale and orientation

[Neumann and Matas, ACCV 2010]

MSER

slide-16
SLIDE 16

Text Detection

16

  • extract character candidates with Stroke Width Transform,

assuming consistent stroke width within each character

  • robust, fast to compute, independent of scale and orientation

[Epshtein et al., CVPR 2010]

SWT

slide-17
SLIDE 17

Text Detection

17

MSER and SWT are representative methods in scene text detection, which constitute the basis of a lot

  • f subsequent works

[Chen et al., ICIP 2011], [Yao et al., CVPR 2012], [Neumann and Matas, CVPR 2012], [Novikova et al., ECCV 2012], [Huang et al., ICCV 2013], [Yinet al., SIGIR 2013], [Koo et al., TIP 2013], [Yin et al., TPAMI 2014], [Yao et al., TIP 2014], [Huang et al., ECCV 2014], …..

slide-18
SLIDE 18

Text Recognition

18

  • seek character candidates using sliding window, instead of

binarization

  • construct a CRF model to impose both bottom-up (i.e. character

detections) and top-down (i.e. language statistics) cues

[Mishra et al., CVPR 2012]

Top-Down and Bottom-up Cues

slide-19
SLIDE 19

Text Recognition

19

  • seek character candidates via MSER extraction
  • utilize Weighted Finite-State Transducers, to simultaneously

introduce language prior and enforce attribute consistency between hypotheses.

[Novikova et al., ECCV 2012]

Large-Lexicon Attribute-Consistent

slide-20
SLIDE 20

Text Recognition

20

  • DPM for character detection, human-designed character

structure models and labeled parts

  • build a CRF model to incorporate the detection scores, spatial

constraints and linguistic knowledge into one framework

Tree-Structured Model

[Shi et al., CVPR 2013]

slide-21
SLIDE 21

Text Recognition

21

Best practice in scene text recognition: redundant character candidate extraction + high level model for error correction

slide-22
SLIDE 22

End-to-End Text Recognition

22

  • detect characters using Random Ferns + HOG
  • find an optimal configuration of a particular word via Pictorial

Structure with a Lexicon

[Wang et al., ICCV 2011]

Lexicon Driven

slide-23
SLIDE 23

End-to-End Text Recognition

23

  • pose character detection a as sequential selection from the set
  • f Extremal Regions (ERs)
  • achieve real-time performance with incrementally computable

descriptors

[Neumann and Matas, CVPR 2012]

Real-Time

slide-24
SLIDE 24

End-to-End Text Recognition

24

  • localize text regions by integrating multiple existing detection methods
  • recognize characters with a DNN running on HOG features, instead of

raw pixels

  • use 2.2 million manually labelled examples for training

[Bissacco et al., ICCV 2013]

PhotoOCR

slide-25
SLIDE 25

End-to-End Text Recognition

25

  • propose a novel CNN architecture, enabling efficient feature

sharing for text detection and character classification

  • generate word and character level annotations via automatic

data mining of Flickr

[Jaderberg et al., ECCV 2014]

Deep Features

slide-26
SLIDE 26

End-to-End Text Recognition

26

Deep learning + Big data seem to dominate this field

slide-27
SLIDE 27

Contents

  • Problem definition
  • Significance and challenges
  • Previous works
  • Our algorithms
  • Conclusion

27

slide-28
SLIDE 28

Our algorithms

28

We will introduce two of our works that propose novel representations for better text detection and recognition

slide-29
SLIDE 29

Multi-Oriented Text Detection

29

detect texts of different orientations, not limited horizontal

  • nes, from natural scenes

[1] Cong Yao, Xiang Bai, Wenyu Liu, Yi Ma, and Zhuowen Tu. Detecting texts of arbitrary orientations in natural images. CVPR, 2012. [2] Cong Yao, Xiang Bai, and Wenyu Liu. A Unified Framework for Multi-Oriented Text Detection and Recognition. TIP , 2014.

slide-30
SLIDE 30

Multi-Oriented Text Detection

30

algorithmic pipeline

slide-31
SLIDE 31

Multi-Oriented Text Detection

31

two sets of rotation-invariant features that facilitate multi-oriented text detection:

  • component level: estimate center, scale, and direction before feature

computation…

  • chain level: size variation, color self-similarity, structure self-similarity…

Main Contribution

slide-32
SLIDE 32

Multi-Oriented Text Detection

32

detection examples on the MSRA TD-500 dataset

Q Qualitative Results

slide-33
SLIDE 33

Multi-Oriented Text Detection

33

detected texts in various languages

Q Qualitative Results

slide-34
SLIDE 34

Multi-Oriented Text Detection

34

compare favorably with the state-of-the-art algorithms when handling horizontal texts

Q Quantitative Results

slide-35
SLIDE 35

Multi-Oriented Text Detection

35

achieve much better performance on texts of arbitrary orientations

Q Quantitative Results

slide-36
SLIDE 36

Mid-Level Elements for Text Recognition

36

a learned multi-scale mid-level representation for scene text recognition

[1] Cong Yao, Xiang Bai, Baoguang Shi, and Wenyu Liu. Strokelets: A Learned Multi-Scale Representation for Scene Text Recognition. CVPR, 2014.

slide-37
SLIDE 37

Mid-Level Elements for Text Recognition

37

multi-scale sampling strokelets discriminative clustering training examples

the proposed discriminative clustering algorithm in [Singh et al, ECCV 2012] is adopted to learn a set of mid-level primitives, called strokelets, which capture the substructures of characters at different granularities

slide-38
SLIDE 38

Mid-Level Elements for Text Recognition

38

learned strokelets and the instances shown in the original images

slide-39
SLIDE 39

Mid-Level Elements for Text Recognition

39

character detection and description with strokelets

slide-40
SLIDE 40

Mid-Level Elements for Text Recognition

40

learned strokelets on different languages: Chinese, Korean, Russian

Q Qualitative Results

slide-41
SLIDE 41

Mid-Level Elements for Text Recognition

41

robust to interference factors like noise, blur, non-uniform illumination, partial occlusion, font variation, scale change

Qualitative Results

slide-42
SLIDE 42

Mid-Level Elements for Text Recognition

42

achieve state-of-the-art performance on IIIT 5K-Word, a large, challenging dataset in this field

Q Quantitative Results

slide-43
SLIDE 43

Mid-Level Elements for Text Recognition

43

achieve highly competitive performance on ICDAR 2003 and SVT

Q Quantitative Results

slide-44
SLIDE 44

Mid-Level Elements for Text Recognition

44

achieve significantly enhanced performance (5% improvement on average) after modification

R Recent Advance

slide-45
SLIDE 45

Contents

  • Problem definition
  • Significance and challenges
  • Previous works
  • Our algorithms
  • Conclusion

45

slide-46
SLIDE 46

Conclusion

46

The common key to the success of the above surveyed text detection and recognition methods is representation, just as in many other vision problems

slide-47
SLIDE 47

Conclusion

47

Conventional methods rely on human designed representations (MSER, SWT, HOG), while CNN based algorithms directly learn representations from data

slide-48
SLIDE 48

Conclusion

48

Learning representation from data is the future trend

slide-49
SLIDE 49

Conclusion

49

But there is still a long way to go, since challenges remain: multi-scale, multi-orientation, multi-language, …

slide-50
SLIDE 50

Thank You!