OPTICAL CHARACTER RECOGNITION Mster de Visi per Computador Curs - - PDF document

optical character recognition
SMART_READER_LITE
LIVE PREVIEW

OPTICAL CHARACTER RECOGNITION Mster de Visi per Computador Curs - - PDF document

OPTICAL CHARACTER RECOGNITION Mster de Visi per Computador Curs 2006 - 2007 Outline Introduction Pre-processing (document level) Binarization Skew correction Segmentation Layout analysis Character


slide-1
SLIDE 1

1

Màster de Visió per Computador Curs 2006 - 2007

OPTICAL CHARACTER RECOGNITION

2

Outline

Outline

  • Introduction
  • Pre-processing (document level)

– Binarization – Skew correction

  • Segmentation

– Layout analysis – Character segmentation

  • Pre-processing (character level)
  • Feature extraction

– Image-based features – Statistical features – Transform-based features – Structural features

  • Classification
  • Post-proccessing

– Classifier combination – Exploitation of context information

  • Examples of OCR systems
  • Bibliography
slide-2
SLIDE 2

2

3

Optical Character Recognition

Statistical Pattern Recognition Structural Pattern Recognition Document Analysis Optical Character Recognition Methods Applications

Introduction

Pattern Recognition Image Processing

4

Some examples

Books, journals, reports Postal addresses Drawings, maps Identity Cards License plates Quality control

Introduction

PDAs Cheques, bills Old documents

slide-3
SLIDE 3

3

5

Document Image Analysis

What is a document? Objects created expressly to convey information encoded as iconic symbols

– Scanned images from paper documents – Electronic documents – Multimedia documents (video with text) – …

Document image analysis is the subfield of digital image processing that aims at converting document images to symbolic form for modification, storage, retrieval, reuse and transmission. Document image analysis is the theory and practice of recovering the symbol structure of digital images scanned from paper or produced by computer.

Introduction

  • G. Nagy: Twenty years of document image analysis in PAMI. IEEE Trans. on PAMI, vol. 22, nº 1, pp.

38-62 January 2000.

6

Applications of DIA

Applications of DIA Document Imaging: Document understanding

  • Recognition
  • Interpretation
  • Indexing
  • Retrieval
  • Digitization
  • Storage
  • Compression
  • Re-printing

Introduction

slide-4
SLIDE 4

4

7

DIA tasks

  • Text-graphics

separation

  • Symbol recognition
  • Interpretation
  • Segmentation
  • Layout analysis
  • OCR

Document Understanding

  • Acquisition
  • Binarization
  • Filtering
  • Vectorization
  • Acquisition
  • Binarization
  • Filtering
  • Skew correction

Document Imaging Mostly graphics Mostly text

Introduction

8

Outline of the course

1. Acquisition 2. Pre-processing − Binarization − Skew correction 3. Layout analysis 4. Character segmentation 5. OCR − Feature extraction − Classification − Post-processing Focus: Document understanding of mostly text documents

Introduction

slide-5
SLIDE 5

5

9

Categorization of Character Recognition

Optical Character Recognition

Machine-printed character recognition Hand-written character recognition On-line character recognition Off-line character recognition According to the type of writing According to the type of acquisition

Introduction

10

Machine-printed character recognition

  • Characters are totally defined by the font type:

– Dimensions (segmentation)

  • Character width
  • Inter-character separation
  • Character height

– Shape (recognition)

  • Typographic effects (boldface, italics, underline).
  • Challenges:

– Similar shapes among characters – Multiple fonts – Joined characters – Digitization noise: broken lines, random noise, heavy characters, etc. – Document degradation: old documents, photocopies, etc.

Introduction

slide-6
SLIDE 6

6

11

Machine-printed character recognition

  • Classification of machine-printed OCR systems

– Monofont:

  • One single type of font

– Multifont:

  • Recognition of a fixed and known set of fonts
  • It is necessary to identify and learn the differences between characters of all the

types of fonts

– Omnifont:

  • Recognition of any arbitrary type of font, even if it has not been previously

learned

Introduction

12

Off-line hand-written character recognition

  • Hand-written
  • Off-line: acquisition by a scanner or a camera
  • Challenges:

– Shape variability among images of the same character – Character segmentation

  • Subproblems:

– Hand-written numeral recognition: digit recognition – Hand-printed character recognition: well-separated characters – Cursive character recognition: non-separated characters

Introduction

slide-7
SLIDE 7

7

13

On-line hand-written character recognition

  • On-line acquisition

– Digitizer tablets – Digital Pen – Tablet PC

  • Advantages with respect to off-line acquisition:

– Image is acquired while the text is written – We can take advantage of dynamic information:

  • Temporal information: writing order, stroke segmentation, etc
  • Writing speed
  • Pen pressure
  • Subproblems:

– Cursive script recognition. – Signature verification/recognition.

Introduction

14

Levels of difficulty in character recognition

  • Little shape variability
  • Small number of characters
  • Little noise

Level 0

0.0. Printed characters. Specific font. Constant size. Roman alphabets. 0.1. Constrained hand-printed characters. Arabic numerals.

Level 1

  • Medium variation in shape
  • Medium noise

1.0. Printed characters. Multiple fonts. Nº characters < 100 1.1. Loosely constrained hand-printed char. Nº char < 100 1.2. Chinese characters of few fonts 1.3. Loosely constrained hand-printed char. Nº char≈1000

Level 2

  • Much variation in shape
  • Heavy noise

2.0. Printed characters of multiple fonts 2.1. Unconstrained hand-printed characters 2.2. Affine transformed characters

Level 3

  • Nonsegmented strings of

characters

3.0. Touching/broken characters 3.1. Cursive handwriting characters 3.2. Characters on a textured background

Introduction

  • S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999.
  • S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier.

Kluwer Academic Publishers. 1999.

slide-8
SLIDE 8

8

15

Levels of difficulty in character recognition

Level 0

0.0. Printed character of a specific font with a constant size

  • Constant size
  • Connectivity of characters
  • Variation in the stroke thickness
  • Little noise

0.1. Constrained hand-printed characters

  • Characters are written according to some instructions or box guidelines

Solved problem

Introduction

16

Levels of difficulty in character recognition

Level 1

1.0. Printed characters of multiple fonts 1.1. Loosely constrained hand-printed characters 1.2. Chinese characters of few fonts. 1.3. Loosely constrained hand-printed characters. Nº characters ≈ 1000

Solved problem

Introduction

slide-9
SLIDE 9

9

17

Levels of difficulty in character recognition

Level 2

2.0. Printed characters of multiple fonts 2.1. Unconstrained hand-printed characters 2.2. Affine transformed characters

Introduction

18

Levels of difficulty in character recognition

Level 3

3.0. Caràcters no separats o trencats 3.1. Cursive handwriting characters 3.2. Characters on a textured background

Introduction

slide-10
SLIDE 10

10

19

Databases for OCR

Off-line Hand-written characters On-line characters Machine- printed CEDAR CENPARMI NIST UNIPEN Univ. Washington

  • 50.000

segmented numerals from zip codes

  • 5.000 zip codes
  • 5.000 city

names

  • 9.000 state

names

  • 17.000

manually segmented numerals from zip codes

  • More than

1.000.000 characters from forms

  • Several learning

and test sets (more variability)

  • 91.500

sentences with a dictionary

  • Definition of a

format to represent on- line data

  • 4.500.000

characters

  • Segmented

characters, words and sentences

  • More than

1500 pages of articles in english

  • More than 500

pages of articles in japanese

  • Originals,

photocopies and pages with arificially generated noise

  • Page

segmentation into labeled zones

Introduction

20

Performance evaluation of OCR systems

  • Hand-printed Character Recognition

– Institute for Posts and Telecommunications Policy (IPIP) – Japan - 1996 – 5000 hand-written numerals from japanese zip codes – Performance of the best system: 97.94% (human performance: 99.84%)

  • Machine-printed Character Recognition

– “The fifth Annual Test of OCR Accuracy”. Information Science Research

  • Institute. TR-96-01. April 1996. http://www.isri.unlv.edu

– 5.000.000 characters from 2000 pages in journals, newspapers, letters and technical reports – Performance in good quality documents: 99.77% - 99.13% – Performance in medium quality documents: 99.27% - 98.21% – Performance in low quality documents: 97.01% - 89.34%

  • Performance 99% => 30 errors /page (3000 characters/page)

Introduction

slide-11
SLIDE 11

11

21

Performance evaluation of OCR systems

Introduction

0.84 99.16 30,000 MLP NIST SD19 database 2.34 97.66 2007 VSVM USPS database 1.3 98.7 2000 VSVM CENPARMI database 1.68 98.32 10,000 POE 0.82 99.18 10,000 LeNet-5 0.56 99.44 10,000 VSV2 0.38 99.62 10,000 VSVMb 0.94 99.06 10,000 GPR MNIST database Error (%) Recognition (%) N of test samples Classifier

C.Y. Suen, J. Tan: Analysis of errors of handwritten digits made by a multitude of classifiers. PRL 26,

  • pp. 369-379. 2005

22

Performance evaluation of OCR systems

Introduction

62.70 11.75 25.56 100 Percentage (%) 395 74 161 630 Sum 81 8 30 119 MLP NIST SD 19 database 21 13 13 47 VSVM USPS database 16 4 6 26 VSVM CENPARMI database 118 9 41 168 POE 51 14 17 82 LeNet5 32 9 15 56 VSV2 17 6 15 38 VSVMb 59 11 24 94 GPR MNIST database Category 3 Category 2 Category 1

  • N. of errors

Classifier

slide-12
SLIDE 12

12

23

Components of an OCR system

ACQUISITION SEGMENTATION CHARACTER PRE-PROCESSING FEATURE EXTRACTION CLASSIFICATION POSTPROCESSING

  • Layout analysis
  • Text/graphics separation
  • Character segmentation
  • Filtering
  • Normalization
  • Image-based features
  • Statistical features
  • Transform-based features
  • Structural features

Context infromation LEARNING

Models

Introduction

DOCUMENT PRE-PROCESSING

  • Filtering.
  • Binarization.
  • Skew correction.

24

Acquisition

  • Acquisition. Scanners

A scanner is a linear camera with a lighting and a displacement system

text Document Lights Scan line Opening CCD Video circuit Digital image Lens

slide-13
SLIDE 13

13

25

Important features in a scanner:

  • Optical resolution / Interpolated resolution.
  • Bits/píxel (depth).
  • Speed (acquisition and calibration).
  • Connection (paralel, USB, SCSI).
  • Programming tools (TWAIN protocol, specific programming languages,

such as SCL d’HP, etc.).

  • Automatic feeding
  • Acquisition. Scanners

Types of scanners:

  • Flatbed scanners. The CCD line is displaced along the paper
  • Traction scanners. The paper is displaced through the CCD line
  • Others. Specific scanners for negative films, cards, passports, etc.

Acquisition

26

  • Acquisition. Scanners vs cameras
  • Resolution determines:
  • Quality of image
  • Size of image
  • Speed of acquisistion
  • Minimal resolution for OCR: 200dpi.
  • 12 points character (2x3mm aprox.) at 200dpi generates an image 16x24.
  • A4 page (297x210 mm)
  • Scanner at 200dpi : image 2376x1680.
  • Camera 1024x1024 pixels: resolution of 90dpi.
  • Dni (85x55 mm)
  • Scanner at 200dpi: image 670x435 punt.
  • Camera 1024x1024 pixels: resolució of 300dpi.

B

200 dpi Noise

Acquisition

slide-14
SLIDE 14

14

27

  • Acquisition. Scanners vs cameras
  • Advantages of scanners:

– Cost – Resolution – Lighting is under control – Control of optical distortions

  • Advantages of cameras:

– Acquisition speed – More flexibility to adapt to the environment and to the material to read

Acquisition

28

On-line acquisition

  • Input device: simulates pen and paper

– The image is acquired while it is generated – Special input device that provides x and y coordinates along the time.

  • Components of the device

– Pen, paper and support. At least, one of these components must be special

  • Technical specifications

– Resolution – Sample frequency – Precision

Digitizer tablet without display Digitizer tablet with display Tablet PC Digital pen and paper

Acquisition

slide-15
SLIDE 15

15

29

Components of an OCR system

ACQUISITION SEGMENTATION CHARACTER PRE-PROCESSING FEATURE EXTRACTION CLASSIFICATION POSTPROCESSING

  • Layout analysis
  • Text/graphics separation
  • Character segmentation
  • Filtering
  • Normalization
  • Image-based features
  • Statistical features
  • Transform-based features
  • Structural features

Context infromation LEARNING

Models

DOCUMENT PRE-PROCESSING

  • Filtering.
  • Binarization.
  • Skew correction.

Pre-processing

30

Binarization

  • Global methods:

– Apply the same global threshold to all the pixels of the image

  • Local adaptative

– Apply a different threshold to every pixel depending on the local distribuition of gray values

  • Special case: binarization of textured backgrounds

Pre-processing

O.D. Trier, T. Taxt: Evaluation of binarization methods for document images. IEEE Trans on PAMI, vol. 17, nº 3, pp. 312-315. 1995

slide-16
SLIDE 16

16

31

Binarization: Otsu

  • Two classes of gray-scale levels: black pixels (foreground)

and white pixels (background).

  • Probabilistic criterion to select the threshold:

– To maximize inter-class variability – To minimize intra-class variability

Pre-processing

  • N. Otsu: A threshold selection method from gray-level histograms. IEEE Trans on systems, man and

cybernetics, vol. 9, nº 1, pp. 62-66, 1979

32

Otsu

  • The probability distribution of gray levels is defined as:

where ni is the number of pixels with gray level i and N is the total number of pixels.

  • We want to define two classes of gray levels:

– C1: [1..k] – C2: [k+1..L] where k is the threshold

  • The mean and standard deviation of each class are defined as:

N n p

i i =

=

=

k i

C i iP

1 1 1

) | ( µ

+ =

=

L k i

C i iP

1 2 2

) | ( µ

1 1)

| ( ω

i

p C i P =

2 2)

| ( ω

i

p C i P =

( )

=

− =

k i

C i P i

1 1 2 1 2 1

) | ( µ σ

( )

+ =

− =

L k i

C i P i

1 2 2 2 2 2

) | ( µ σ

=

=

L i i T

ip

1

µ ( )

=

− =

L i i T T

p i

1 2 2

µ σ

=

=

k i i

p

1 1

ω

+ =

=

L k i i

p

1 2

ω

Pre-processing

slide-17
SLIDE 17

17

33

Otsu

  • We define within-class and between-class variances:
  • The following criteria of class separability are equivalent:
  • Then, the optimal threshold is:

2 2 2 2 1 1 2

σ ω σ ω σ + =

W

( ) ( )

2 2 2 2 1 1 2 T T B

µ µ ω µ µ ω σ − + − =

2 2 W B

σ σ λ =

2 2 W T

σ σ κ =

2 2 T B

σ σ η =

1 + = λ κ

1 + = λ λ η

) ( max arg

2 1

k k

B L k

σ

≤ ≤

=

Pre-processing

34

Binarization: Local adaptative binarization

  • The threshold at every pixel depends on the local distribution of gray levels on

the neighborhood of the pixel.

  • Niblack’s method:

– m(x,y) and s(x,y) are the mean and standard deviation in a local neighborhood of the pixel. – Window size: 15 x 15 – k = -0.2

  • Eikvil et al.’s method:

– For every pixel, we define a small window S (3 x 3) and a large window L (15 x 15). – The threshold value is selected by applying Otsu to the large window L. – The pixels in the small window S are thresholded using this value.

) , ( ) , ( ) , ( y x s k y x m y x T ⋅ + =

Pre-processing

  • W. Niblack: An introduction to digital image processing, pp. 115-116, Prentice Hall, 1986
  • L. Eikvil, T. Taxt, k. Moen: A fast adaptative method for binarization of document images.

International Conference on Document Analysis and Recognition 2001, pp. 435-443

slide-18
SLIDE 18

18

35

Binarization: evaluation

Otsu Otsu Eikvil Niblack

36

  • Visual criteria of evaluation

– Broken line structures: gaps in lines – Broken symbols and text: symbols and text with gaps – Blurring of lines, symbols and text – Loss of complete objects – Noise in homogeneous areas: noisy spots and flase objects in both background and print

  • Scale of 1-5 for each criterion

Binarization: evaluation

slide-19
SLIDE 19

19

37

Binarization: evaluation

38

Binarization: Textured Backgrounds

  • 1. Selection of candidate thresholds
  • Iterative application of Otsu
  • At each iteration Otsu is applied to the part of the

histogram with the lowest mean

  • The number of iterations depends on the number of

peaks in the histogram

  • We get a set of possible threshold values
  • 2. Computation of a set of texture features for

each possible binarization

  • Based on the computation of run-length histogram
  • f the binarized image, R(i), i∈{1,..L}
  • 3. Selection of the optimal binarization

Pre-processing

  • Y. Liu, S. Srihari: Document image binarization based on texture features: IEEE Trans. on PAMI, vol.

19, nº 5, pp 540-544, 1997

slide-20
SLIDE 20

20

39

Binarization: Textured Backgrounds

  • Texture features

– Stroke width: run-length with the highest frequency – Stroke-like pattern noise: relevance of stroke-like patterns in the background between consecutive threshold selections. If it is high, it denotes stroke-like patterns due to texture. – Unit run noise: relevance of unit run-lengths. Ideally, it should be low – Long run noise: relevance of long rung-lengths. Ideally, it should be low – Broken character: it will be high if characters are broken.

i

i i R 1 ), ( max arg SW ≠ =

1 , ) ( max ) ( max

1

≠ =

+

i i R i R SPN

j i j i

1 , ) ( max ) 1 ( ≠ = i i R R URN

i

1 , ) ( max ) ( ≠ = ∑ > i i R i R LRN

i L i

)} ( max arg , { ' , ) ( max ) ( min

'

i R I i R i R BC

i i I i

= =

Pre-processing

40

Binarization: Textured Backgrounds

  • Experimental results show it is enough with 2 iterations of otsu
  • Decision tree to select the optimal threshold (between T1 and T2,

T1 > T2)

  • 1. Select the value with larger stroke width feature.
  • 2. If T1 is the selected value,

– If background noise features are low, T1 is the selected threshold. – Otherwise, if T2 does not result in many, broken characters, select T2

  • 3. If T2 is the selected value,

– If the broken character feature is low, select T2 – Otherwise, if noise features are low with T1, select T1

  • 4. If neither of both thresholds is quite good, select the average between them

Pre-processing

slide-21
SLIDE 21

21

41

Binarization of textured backgrounds

Pre-processing

42

Skew Correction : Projection profiles

1. Compute the horizontal projection at several angles (depending on the desired resolution) 2. For every projection, compute a directional criterion (that estimates the difference between maxima and minima in the projection)

  • Sum of squared differences in adjacent rows
  • Variance of the number of black pixels in a scan line

3. Select the angle that maximizes the directional criterion

Pre-processing

  • W. Postl: Detection of linear oblique structures and skew scan in digitized documents. 8th International

Conference on Pattern Recognition, pp. 687-689, 1986

slide-22
SLIDE 22

22

43

Skew Correction : Projection profiles

  • Modification of the previous algorithm:

– Use only the bottom-centers of connected components to compute the projection profiles – Reduces computation cost

H.S. Baird: The skew angle of printed documents. Proc. Of the Society of Photographic Scientist and Engineers, pp. 14-21, 1987.

44

Analysis of neighboring connected components

  • 1. Compute connected components.
  • 2. For every connected component, search the k nearest negihbours

(k = 5)

  • 3. For every pair of connected components, compute the angle

between the centroids.

  • 4. Compute the histogram of angles.
  • 5. Estimation of the skew angle: maximum of the histogram

Pre-processing

  • L. O’Gorman: The document spectrum for page layout analysis. IEEE Trans. on PAMI, vol. 15, nº 11,

1993

slide-23
SLIDE 23

23

45

Analysis of neighboring connected components

  • Accurate angle estimation:
  • 1. Find connected components in the same line (clusters of pairs of

connected components with an angle near the rough estimation).

  • 2. Fit a straight line (using regression) to the centroids of components in

each line.

  • 3. Make the final estimation from these text lines.

Pre-processing

46

Components of an OCR system

ACQUISITION SEGMENTATION CHARACTER PRE-PROCESSING FEATURE EXTRACTION CLASSIFICATION POSTPROCESSING

  • Layout analysis
  • Text/graphics separation
  • Character segmentation
  • Filtering
  • Normalization
  • Image-based features
  • Statistical features
  • Transform-based features
  • Structural features

Context infromation LEARNING

Models

DOCUMENT PRE-PROCESSING

  • Filtering.
  • Binarization.
  • Skew correction.

Segmentation

slide-24
SLIDE 24

24

47

Page Layout Analysis

  • Layout analysis: segmentation of the image into several blocks

with the same type of information: text, graphics, table, image, etc.

  • Methods:

– Run-length smearing – Analysis of connected components

Segmentation

48

Page Layout Analysis

slide-25
SLIDE 25

25

49

Page Layout Analysis: Run-length smearing

1. Horizontal run-length smearing

  • Threshold: inter-character separation:

maximum of the histogram of the width of white runs

2. Vertical smearing

  • Threshold: inter-line separation: estimated

during skew correction

3. Logical AND between both images 4. Additional horizontal smearing 5. Connected components are the blocks 6. Computation of features for each connected component: aspect ratio, black pixels density, Euler number, perimeter length, perimeter to width ratio, perimeter- squared to area ratio

1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

Segmentation

J.L. Fisher, S. C. Hinds, D.P. D’amato: A rule-based system for document image segmentation. 10th International Conference on Pattern Recognition, pp. 567-572, 1990.

50

Page Layout Analysis: Run-length smearing

  • A set of rules classifies each block into text or non-text

according to the features for each connected component

slide-26
SLIDE 26

26

51

Page Layout Analysis: Run-length smearing

Segmentation

52

Page Layout Analysis: Analysis of connected components

  • 1. Detection of connected components
  • 2. Definition of distance and overlap among components
  • 3. Grouping of connected components in the same line:
  • Dx below a distance threshold
  • Vy above an overlap threshold

)] ( ), ( min[ )] ( ), ( max[ ) , (

j l i l j u i u j i x

  • X
  • X
  • X
  • X
  • D

− = )] ( ), ( min[ )] ( ), ( max[ ) , (

j l i l j u i u j i y

  • Y
  • Y
  • Y
  • Y
  • D

− = )] ( ), ( min[ ) , ( ) , (

j i j i x j i x

  • W
  • W
  • D
  • V

− = )] ( ), ( min[ ) , ( ) , (

j i j i y j i y

  • H
  • H
  • D
  • V

− =

Segmentation

A.K. Jain, B. Yu: Document representation and its application to page decomposition. IEEE Trans. on PAMI, vol. 20, nº 3, pp. 294-308, 1998

slide-27
SLIDE 27

27

53

Page Layout Analysis: Analysis of connected components

4. Classification of lines between text and non-text

  • Text lines:
  • Height below a threshold and horizontally aligned (standard deviation of bottom

edges of CC below a threshold)

  • Width over a threshold and all CC have similar height (ratio between mean and

standard deviation less than a threshold)

5. Grouping of text lines into text regions

  • Vertically close and horizontally overlapped

6. Grouping of non-text lines into non-text regions

  • Vertically close and horizontally overlapped
  • Horizontally close and vertically overlapped

7. Identification of image regions from non-text regions

  • Large regions
  • The ratio of black pixels is large

8. Identification of table regions

  • Detection of horizontal and vertical lines with similar orientations
  • Similar height of CC

9. Remaining non-text regions: drawing regions

Segmentation

54

Page Layout Analysis: Analysis of connected components

Segmentation

slide-28
SLIDE 28

28

55

Page Layout Analysis: Multiscale analysis

1. Application of wavelets to obtain a multiscale representation 2. Computation of local features (local moments) from the wavelet representation 3. Training of a neural network to classify each block as text, image or graphics according to these local features. 4. Propagation of classification through adjacent blocks and between different scales of the wavelet representation

( )

n W x w i i n

i i

f x f W W

1

) ( | | 1 ) ( ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − =

∈ −

µ

Segmentation

56

Page Layout Analysis: Performance Evaluation

  • Goals

– Evaluate the performance of several commercial OCR engines on a set

  • f journal pages

– Use the results of the evaluation to define methods for the combination

  • f these engines in order to improve the overall performance
  • Therefore, evaluation must permit to determine the strengths and

weaknesses of each method

slide-29
SLIDE 29

29

57

Page Layout Analysis: Performance Evaluation

Segmentation

Ground-truth Output of segmentation Error zones

Correct Misrecognition Unrecognition

58

Page Layout Analysis: Performance Evaluation

slide-30
SLIDE 30

30

59

Page Layout Analysis: Performance Evaluation

  • 5 types of zones:

– Text – Graphics – Table – Background – Text Over Image

  • Comparation between the ground-truth and the output of engines

Output zones Ground-truth zones Overlapping between output and ground-truth zones

Segmentation

60

Page Layout Analysis: Performance Evaluation

  • Evaluation measures: more than 100 measures grouped in six

categories

– Good recognition measures: percentages of ground truth area (grouped by zone type), recognized as zones with same type in the output, i.e. text zones in ground truth recognized as text zones, graph zones recognized as graph zones, etc. – Unrecognition measures: relative area of zones with type text, graph, textoverimage or table in ground truth recognized as background – Misrecognition measures: zones in ground truth recognized as a different type, for example text zone recognized as (graph, textoverimage or table). – Overlap measures: relative area (grouped by type) recognized twice by a zoning engine. – Split and merge measures: how many zones are recognized and assigned in terms of splitting and merging errors.

Segmentation

slide-31
SLIDE 31

31

61

Page Layout Analysis: Performance Evaluation

  • Experiments

– Creation of the ground-truth for 100 journal pages – Evaluation of six OCR engines – Tests with two formats of images (TIFF and JPEG) and 4 image resolutions (100 dpi, 200 dpi, 300 dpi and 400 dpi) – Evaluation of a simple combination scheme

Segmentation

62

Page Layout Analysis: Performance Evaluation

slide-32
SLIDE 32

32

63

Page Layout Analysis: Performance Evaluation

64

Page Layout Analysis: Performance Evaluation

slide-33
SLIDE 33

33

65

Page Layout Analysis: Performance Evaluation

66

Page Layout Analysis: Performance Evaluation

slide-34
SLIDE 34

34

67

Page Layout Analysis: Performance Evaluation

68

Page Layout Analysis: Performance Evaluation

Text correct recognition

10 20 30 40 50 60 70 80 90 100 TIFF400 TIFF300 TIFF200 TIFF100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

Text correct recognition

10 20 30 40 50 60 70 80 90 100 JPG400 JPG300 JPG200 JPG100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

Text misrecognition

10 20 30 40 50 60 70 80 90 100 TIFF400 TIFF300 TIFF200 TIFF100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

Text missrecognition

10 20 30 40 50 60 70 80 90 100 JPG400 JPG300 JPG200 JPG100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

Segmentation

slide-35
SLIDE 35

35

69

Page Layout Analysis: Performance Evaluation

Graph correct recognition

10 20 30 40 50 60 70 80 90 100 TIFF400 TIFF300 TIFF200 TIFF100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

Graph correct recognition

10 20 30 40 50 60 70 80 90 100 JPG400 JPG300 JPG200 JPG100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

ToI correct recognition

10 20 30 40 50 60 70 80 90 100 TIFF400 TIFF300 TIFF200 TIFF100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

ToI correct recognition

10 20 30 40 50 60 70 80 90 100 JPG400 JPG300 JPG200 JPG100 ABBYY 6 ABBYY 7 PODCORE PODCORE2 SCANSOFT XVISION COMB

Segmentation

70

Page Layout Analysis: Performance Evaluation

  • Conclusions

– Image format makes no difference in the final results – No ideal resolution

  • Sometimes better at 300dpi, sometimes, better at 400dpi
  • Results are a little bit lower at 200dpi or 100dpi

– Combination can improve the results, but more advanced combination schemes should be defined

Segmentation

slide-36
SLIDE 36

36

71

Text/Graphics Separation

  • Analysis of connected components

– Size: characters are smaller than graphic components – Aspect ratio x-y: characters are more “squared” than graphic components – Pixel density: characters are more dense than graphic components

  • Grouping of characters: Hough transform and component proximity
  • Difficulties

– Joined characters – Characters touching to lines

Segmentation

L.A. Fletcher, R. Kasturi: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. on PAMI, vol. 10 nº 5, pp. 910-918, 1988

72

Text/Graphics Separation

  • Detection and removal of lines at given orientations: horizontal, vertical,

±22.5, ±45, ±67.5

– Detection of long consecutive runs of black pixels after rotating the image at given orientations.

  • Analysis of connected components to separate text and graphics
  • Z. Lu: “Detection of Text Regions From Digital Engineering Drawings”. IEEE Trans. on PAMI,
  • Vol. 20, n.4, pp. 431-439. April 1998.

Specific methods

slide-37
SLIDE 37

37

73

Text/Graphics Separation

  • Vertical and horizontal run-length smearing to join components
  • Classification of final components as text or graphics based on density and size of

components

  • Recovering of the original image from the enclosing rectangles of text components

74

  • Segmentation of characters in blocks of text
  • Levels of difficulty:

– Characters with uniform separation and fixed width – Well separated characters with proportional width – Broken characters – Touching characters – Broken and touching characters – Cursive script – Hand-printed words – Handwritten cursive words.

Character segmentation

Introducció Segmentation

  • R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE

Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996.

  • Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n. 1,
  • pp. 67-80, 1995.
  • Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. Pattern

Recognition, Vol. 29, n. 1, pp. 77-96, 1996.

slide-38
SLIDE 38

38

75

  • Relevant features for segmentation

– Character width and height – Distance between characters – Inter-character interval (distance between character centres) – Aspect ratio. – Baseline and Top Baseline – Ascenders and Descenders

Character segmentation

Segmentation

76

Character segmentation: classification of methods

  • External segmentation:

– Segmentation before recognition. Independent processes – The goal is to find the exact location of character separation – Low performance with cursive script, touching characters or handwriting

  • Internal segmentation:

– Based on Sayre Paradox: a letter cannot be segmented without being recognized and cannot be recognized without being segmented – Segmentation and recognition are done at the same time – Recognition generates or validates segmentation hypothesis

  • Holistic methods

– No character segmentation – Recognition tries to recognize words without recognizing individual characters

Segmentation

slide-39
SLIDE 39

39

77

Character segmentation: classification of methods

analytical holistic external

(based on image)

internal

(based on recognition)

hybrid dissection post-process (graphemes) windowing Feature-based Hidden Markov Model No Markov Markov Dynamic programming

Segmentation

78

External segmentation

  • Image decomposition in sub-images using general features
  • Each subimage corresponds to a possible character
  • Combination of several methods:

– Connected component labelling – Run-Lenght smearing. – Projections and X-Y tree decomposition – Analysis of contours – Analysis of profiles

Segmentation

slide-40
SLIDE 40

40

79

External segmentation : Connected component labelling

?

Original image Labelling according to neighbours End of labelling Unification of equivalent classes

Segmentation

80

External segmentation : Projections and X-Y trees

If text lines are perfectly separated, only one vertical projection is

  • required. Otherwise, it is necessary to apply several vertical and

horizontal projections

i

... ... ...

1.1 1.2 1.10 2.1 2.2 1.9 2.4 2.4.1 2.4.2 1 2

projecció hor. projecció vert. projecció hor.

t h s c

  • .

X-Y tree

Segmentation

slide-41
SLIDE 41

41

81

External segmentation : projections

) ( ) 1 ( ) ( 2 ) 1 ( ) ( x V x V x V x V x F + + − − =

It is more robust to use the second derivative of the projection with respect to the value at each point of the histogram (it enhances the projection minima)

Segmentation

82 AND

Smearing vertical

External segmentation: Run-length smearing

  • Run-lengths: sequences of consecutive pixels with the same

colour in a row or column

  • Smearing: inversion of runs with a length below a certain

threshold

Smearing horitzontal

Segmentation

slide-42
SLIDE 42

42

83

External segmentation: analysis of profiles

  • Determination of the point of separation between characters:

– Follow the profile beginning at a minimum up to the following maximum – Distance between upper and lower profile

Upper profile Lower profile

Segmentation

84

External segmentation: post-process

  • Problem: broken and touching characters

– Analysis of the bounding box: definition of several rules that permit to join or break them properly

  • Estimated character size (width and height)
  • Number of estimated characters
  • Component aspect ratio
  • Proximity/overlapping of bounding boxes

Segmentation

slide-43
SLIDE 43

43

85

External segmentation: post-process

  • “Hit and Deflect” strategy

– Starting point: maximum of lower profile / minimum of upper profile – Contour following:

  • Vertical scan to find a contour point
  • Move the scan point according to the value of neighbouring pixels:

– To the right/left if one pixel correspond to the character and the other does not – Up, if both neighbouring pixels belong to the character

Segmentation

  • M. Shridhar, A. Badreldin. “Recognition of Isolated and Simply Connected Handwritten Numerals”.

Pattern Recognition, Vol. 19, n. 1, pp. 1-12, 1986.

86

External segmentation: oversegmentation

  • Segmentation of the image into subimages that do not

necessarily correspond to individual characters

  • Analysis form the contour minima and maxima

– Detection of significant contour minima:

  • Cut points
  • Lower extreme of a character

– Generation of possible cut points:

  • For each contour minima, search to the left and to the right those points

that correspond to a single vertical run with low density

– Compactation of nearby cut points

Segmentation

  • R. Bozinovic, S. Srihari. “Off-line Cursive Script Word Recognition”. IEEE Trans. on PAMI, Vol. 11, n. 1, 1989
slide-44
SLIDE 44

44

87

Character segmentation: classification of methods

analytical holistic external

(based on image)

internal

(based on recognition)

hybrid dissection post-process (graphemes) windowing Feature-based Hidden Markov Model No Markov Markov Dynamic programming

Segmentation

88

Internal segmentation

  • Segmentation and recognition at the same time
  • Two approaches:

– Windowing

  • Sequential scan the image from left to right
  • Generation of segmentation hypotheses
  • Detection of the cut points with the best recognition performance

(verification step)

– Based on image features

  • Feature detection
  • Generation of possible correspondences between features and letters
  • Search for the best possible combination among all correspondences
  • Two types of methods: Markov-based and non Markov-based

Segmentation

slide-45
SLIDE 45

45

89

Internal segmentation: windowing

  • A mobile window is used to generate possible segmentation sequences
  • Each possible segmentation is validated by the recognition process
  • Search for a segmentation sequence that yields to a valid final result

Segmentation

R.G. Casey, G. Nagy: Recursive segmentation and classification of composite patterns. 6th International Conference on Pattern Recognition, pp. 349-451, 1986

90

Internal segmentation: windowing

  • Shortest Path Segmentation

– Representation of all segmentation possibilities with a graph:

  • Nodes: all possible combinations of pre-

segmented zones

  • Edges: neighbour compatibility between pre-

segmented zones

– Using a neural network, each node is assigned a recognized character, with a measure of confidence (distance) – Finding the shortest path in the graph is equivalent to finding the best possible segmentation

Segmentation

C.J.C. Burges, J.I. Be, C.R. Nohl: Recognition of handwritten cursive postal words using neural

  • networks. Proc. USPS 5th Advanced Technology Conference, p. A-117, 1992
slide-46
SLIDE 46

46

91

  • Graph-based:

– Representation of image skeleton with a graph – Subgraph matching to find possible correspondences with the character prototypes – Creation of a network

  • Nodes: recognized prototypes labelled with the matching cost
  • Edges: adjacency relationship among nodes

– Recognition: searching the optimal path in the network

Internal segmentation: Feature-based

C U I J S T M O C X T E R

Segmentation

  • J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9,
  • pp. 903-909, 1995

92

Components of an OCR system

ACQUISITION SEGMENTATION CHARACTER PRE-PROCESSING FEATURE EXTRACTION CLASSIFICATION POSTPROCESSING

  • Layout analysis
  • Text/graphics separation
  • Character segmentation
  • Filtering
  • Normalization
  • Image-based features
  • Statistical features
  • Transform-based features
  • Structural features

Context infromation LEARNING

Models

DOCUMENT PRE-PROCESSING

  • Filtering.
  • Binarization.
  • Skew correction.

Character pre- processing

slide-47
SLIDE 47

47

93

Character pre-processing

Some usual pre-processing operation in OCR:

  • Filtering: noise reduction
  • Thinning
  • Binarization
  • Normalization:

– Reduce character variability – Converting the character to a normal shape:

  • Orientation
  • Slant
  • Size
  • Stroke thickness

Character pre- processing

94

Normalization

Inverse transforms to reduce intra- class variance The most usual normalization transforms are:

  • Rotation. Rotated scan, text in graphic

documents

  • Slant. Cursive fonts or handwriting
  • Stroke thickness. Bold fonts of very

thin strokes, handwriting with different pen thickness

  • Size. Titles, footnotes, handwriting

Character pre- processing

slide-48
SLIDE 48

48

95

Normalization

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛

y x

t t y x a a a a y x

22 21 12 11

' '

Type Parameters Meaning Traslation aij=0 ; i,j=1,2 tx, ty: traslation factors a11=sx a12=0 Scaling a21=0 a11=sy sx, sy: scaling factors a11=cos α a12=-sin α Rotation a21=sin α a22=cos α α: rotaion angle a11=1 a12=tg β Slant a21=0 a22=1 β: slant angle

Character pre- processing

96

Normalization

  • Rotation.

– To determine the baseline of a set of aligned characters: projection analysis, Hough transform – To determine the orientation of a single characters: inertia axes (moments of second order)

  • Slant.

– Approximation of slant angle from the regression line of the character pixels – Approximation from the orientation of “vertical” segments in a word

Character pre- processing

slide-49
SLIDE 49

49

97

Normalization

  • Size.

– Normalization to a standard size – Normalization of the relation between x size and y size – Usually, it is done from the bounding box of the character – Some pixel resamping is required: interpolation to avoid ”aliasing”

  • Stroke thickness

– It is not straightforward to determine because of variability in the character itself – Approximation of stroke thickness:

  • It is assumed that the character stroke has length l and width w.
  • The width is estimated from the area and perimeter of the character

– Then, morphological operations (dilation and erosion) are used to normalize the character to standard width

Character pre- processing

98

Non-linear normalization

  • Optical distortion

– Parameter estimation with least squares criterion using a calibration image t coefficien Distorsion : t coefficien ion Magnificat : ) ( ) ( ' '

2 2 2 2 d m d m

C C y y x x y x C y x C y x ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛

Character pre- processing

slide-50
SLIDE 50

50

99

Components of an OCR system

ACQUISITION SEGMENTATION CHARACTER PRE-PROCESSING FEATURE EXTRACTION CLASSIFICATION POSTPROCESSING

  • Layout analysis
  • Text/graphics separation
  • Character segmentation
  • Filtering
  • Normalization
  • Image-based features
  • Statistical features
  • Transform-based features
  • Structural features

Context infromation LEARNING

Models

DOCUMENT PRE-PROCESSING

  • Filtering.
  • Binarization.
  • Skew correction.

Feature Extraction

100

Feature-based Recognition

  • Probably, the selection of the method for feature extraction is the most

important factor in order to achieve good recognition performance Which are the best features to discriminate between the characters?

Feature Extraction

slide-51
SLIDE 51

51

101

Feature Extraction

  • Selection of appropiate features:

– It is a critical decision – It depends on the specific application – Features must be invariant to the character variations (depending on the application): rotation, degradation, noise, shape distortion – Low dimensionality to avoid large learning sets – Features determine the type of information to work with: gray-level image, binary image, character contour, vectorization of the skeleton, etc. – Features also determine the type of classifier

To extract from the image the most relevant information for classification, i.e., to minimize intra-class variability while maximizing inter-class variability

Feature Extraction

102

Feature Extraction

  • Image-based features

– Projection – Profiles – Crossings

  • Statistical features

– Moments – Zoning

  • Global transforms and series expansion

– Karhunen-Loeve – Fourier descriptors

  • Topological and geometric features. Structural analysis

– Contour analysis – Skeleton analysis – Topological and geometric features

Feature Extraction

O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996.

slide-52
SLIDE 52

52

103

Image-based Features

  • All the image as feature vector

– Classification by correlation – Very sensitive to noise, character distortion and similarity between classes.

  • x and/or y projections

– We can use the accumulated projection too – Sensitive to rotation, distortion and large number of characters

  • Peephole.

– Coding with a binary number some pre- selected pixels of the image – Pre-selected pixels can vary depending on the character to be recognized

010111011

A

Feature Extraction

104

Image-based Features

  • Contour profiles

– Left (right) profile: contour minimum (maximum) x at every y value – Lower (upper) profile: contour minimum (maximum) y at every x value – Feature:

  • Profile values
  • Difference between consecutive profile values
  • Maxima and minima of the profile
  • Maxima of the difference between profile values

Feature Extraction

  • F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. Pattern

Recognition, 24(10), pp. 969-983, 1991

slide-53
SLIDE 53

53

105

Image-based Features

  • Crossing method

– Features are computed from the number of times that the character is crossed by vectors along some orientations, for example, 0o, 45o, 90o, 135o. – Used in commercial systems because of speed and low complexity – Robust to some distortions and noise – Sensitive to size variations

Feature Extraction

106

Statistical Features

  • Methods based on the statistical distribution of pixels in the

image:

– Geometric moments – Zoning.

  • Features are robust to distortion and, up to a certain extent, to

some style variations

  • Low computation time and easy to implement
  • A learning step is needed to infer the model of characters

Feature Extraction

slide-54
SLIDE 54

54

107

Statistical Features: Zoning

  • The image is divided in n x m cells.
  • For each cell the mean of gray levels is computed and all these values are

joined in a feature vector of length n x m.

  • We can also use information from the contour or any other feature computed in

every zone

  • rientation

num. 0o 45o 90o 135o 1 5 2

Feature Extraction

  • F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. Pattern

Recognition, 24(10), pp. 969-983, 1991

108

Statistical Features: Geometric Moments

  • Moments of order (p+q) of image f:

– moo character area (in binary images) – Center of gravity of the character:

  • Central moments (centering the character at the center of

gravity):

– Central moments of order 2 (µ20, µ02, µ11) permit to compute:

  • Main inertia axes
  • Character length
  • Character orientation

∑∑ =

= = N x N y q p pq

y x y x f m

1 1

) ( ) )( , (

00 10

m m x =

00 01

m m y =

∑∑ − − =

= = N x N y q p pq

y y x x y x f

1 1

) ( ) )( , ( µ

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − =

02 20 11

2 2 1 µ µ µ θ atan

Feature Extraction

  • M. Hu: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theroy 8, pp. 179-187, 1962
slide-55
SLIDE 55

55

109

Statistical Features: Geometric Moments

  • Invariant moments:

– Central moments µpq are traslation-invariant – Scale-invariants

2 11 2 02 20 2 02 20 1

) ( ν ν ν φ ν ν φ + − = + = 2 ,

q)/2) (p (1 00

≥ + =

+ +

q p

pq pq

µ µ ν – Rotation-invariants (order 2): – Invariant to general linear transforms:

( )

( )( )

2 12 03 21 2 21 12 30 2 12 21 03 30 2 2 11 02 20 1

4 µ µ µ µ µ µ µ µ µ µ µ µ µ − − − − = − = I I

4 00 1 1

µ ψ I =

4 00 1 1

µ ψ I =

Feature Extraction

– A set of moment invariants of different orders can be defined in a similar way

T.H. Reiss: The revised fundamental theorem of moment invariants. IEEE Trans. PAMI,

  • vol. 13, nº 8, pp. 830-834, 1991

110

Statistical Features: Zernike Moments

  • Geometric moments:

– Projection of the function f(x,y) over the monomial xpyq (No orthogonality => information redundancy)

  • Zernike moments:

– Change to polar coordinates to achieve orthogonality and rotation invariance – Projection of the image over the Zernike polynomials Vnm which are orthogonalinside the unitary circle x2+y2=1.

− = −

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − − ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ − + − − = − ≤ ≥ + = = − = = =

2 / ) ( 2 2 2

! 2 ! 2 ! )! ( ) 1 ( ) ( and even, is , , ), / tan( , 1 where ) ( ) , ( ) , (

m n s s n s nm jm nm nm nm

s m n s m n s s n R m n n m n y x x y a j e R V y x V ρ ρ ρ θ ρ θ ρ

θ

Feature Extraction

  • A. Khotanzad, Y.H. Hong: Invariant image recognition by Zernike moments. IEEE Trans. PAMI,
  • vol. 12, nº 5, pp. 489-497, 1990.
slide-56
SLIDE 56

56

111

Statistical Features: Zernike Moments

  • The image is projected over the Zernike polynomials:
  • Anm coefficients are the Zernike moments of order n and repetition m:

where x2+y2≤ 1 (the image must be re-scaled to the unitay circle) and * is the conjugate complex

  • | Anm| is rotation invariant
  • Relation between Zernike moments and geometric moments:

∑∑

+ =

x y nm nm

V y x f n A * )] , ( )[ , ( 1 θ ρ π

∑∑

=

n m nm nmV

A y x f ) , ( ) , ( θ ρ ( ) ( )

π µ µ π µ µ µ π π µ 1 2 3 , 2 3 , , 1

02 20 20 11 20 02 22 1 , 1 11 00 00

− + = − − = = = = =

A j A A A A

Feature Extraction

112

Statistical Features: Zernike Moments

Rows 1 and 2: Zernike moments of order 1-13. This equation is used to display them: Rows 3 and 4: Image reconstruction from Zernike moments of order 1-13.

  • To reconstruct the image from Zernike moments:

Orders 1 and 2, for example, represent orientation, height and width parell. i , 1

  • n

, ) , ( ) , (

2 2

m n n m y x y x V A y x I

m nm nm n

− ≤ ≤ + = ∑

∑ ∑

= − ≤ ∞ →

=

N n m n n m m nm N

y x V lim y x f

parell , , nm

) , ( A ) , (

Feature Extraction

slide-57
SLIDE 57

57

113

Statistical Features: Zernike Moments

  • Image reconstruction using moments up to order 10 (66

moments).

Feature Extraction

114

Statistical Features: Zernike Pseudo-Moments

  • Less noise sensitive than Zernike moments
  • Better recognition results

– Removing the constraint:

( )( )

− = −

− − − + + − + − =

m n s s n m nm

s m n s m n s s n R ! ! 1 ! )! 1 2 ( ) 1 ( ) ( ρ ρ

ven m n e −

Feature Extraction

slide-58
SLIDE 58

58

115

Statistical Features: Invariant Moments

  • Experiments show that a robust OCR system needs, at least, 10-

15 features, i.e, we need to define between 10 and 15 geometric invariants

  • Handwritten digit recognition:

– Moments up to order 6:

  • Regular moments (24 moments): 94%
  • Zernike moments (23 moments): 95%
  • Zernike psedudo-moments (44 moments): 91.5%

– Moments of upper orders: decrease in recognition performance

Feature Extraction

S.O. Belkasim, M. Shridhar, M. Ahmadi: Pattern Recognition with Moment Invariants: A Comparative Study and New Results. Pattern Recognition, Vol. 24, n. 12, pp. 1117-1138, 1991.

116

Transform-based features

  • Instead of using the character image to compute the feature vector, a linear

transform is applied to compute the features where T is a matrix of constant values

  • These transforms help to reduce the dimensionality of the feature vector,

preserving the most relevant information of the shape of the character

  • Original image can be reconstructed from the feature vector
  • Features are invariant to some global deformations, such as traslation and

rotation

  • High computational cost
  • Some examples:

– Karhunen-Loeve. – Sèries de Fourier.

f T g ⋅ =

Feature Extraction

slide-59
SLIDE 59

59

117

Transform-based features: Karhunen-Loeve Expansion

  • KLT is defined as:

where f is the feature vector of the image and f is the mean of all the samples representing the character

  • Each column of T is an eigenvector of the covariance matrix:
  • Usually, only M (M<d ) eigenvectors are used , corresponding to

the largest eigenvalues. In this way, dimensionality is reduced

) ( f f T g

t

− ⋅ = ∑ =

= N i i

f N f

1

1

=

− ⋅ − =

N i t i i

f f f f N C

1

) ( ) ( 1

Feature Extraction

118

Transform-based features: Karhunen-Loeve Expansion

  • For each image, we get a feature vector x of dimension d
  • The learning set is composed of n samples per class
  • The set of samples is represented by matrix X, of dimension nxd.
  • For each class, we can compute the covariance matrix R:
  • Then, the transform matrix T is defined as:
  • The transformation of an image x is:

T i i T

x x x x n X X n R ) )( ( 1 1 1 1 − − − = − =

[ ]

d d d d

R R v v v v T λ λ λ λ λ L L L L ≥ ≥ =

2 1 1 1 1

matrix covariance

  • f

s eigenvalue : matrix covariance

  • f

rs eigenvecto :

x T y

T

=

Feature Extraction

slide-60
SLIDE 60

60

119

Transform-based features: Karhunen-Loeve Expansion

  • Usually, the dimensionality is reduced and we only take the m eigenvectors of

R with greater weight

  • Then, the transform matrix becomes:
  • For each image xi, the feature vector yi is:
  • Usually m is selected in such a way that the eigenvectors explain some pre-

specified percentage of the total variance (usually 0.9 or 0.95).

  • The percentage of variance explained by the eigenvectors is given by:

[ ]

m

v v P L

1

=

i T i

x P y =

= m i i 1

λ

Feature Extraction

120

Transform-based features: Karhunen-Loeve Expansion

  • Application of KL transform to the NIST database:

– Digit recognition: 96% - 97% – Uppercase recognition: 89% - 90% – Lowecase recognition: 77% - 82%

  • M. D. Garris, J.L. Blue, G. T. Candela, P. J. Grother, S. A. Janet, C.L. Wilson: NIST Form-based

handprint recognition system (release 2.0). Technical report NISTIR 5959, National Institute of Standards and Technology, USA, 1994.

slide-61
SLIDE 61

61

121

Transform-based features: Fourier Descriptors

  • Decomposition as a Fourier series of a periodic function of

period T

∞ −∞ =

⋅ =

n T nt i n e

C t f

π 2

) (

⋅ =

T T nt i n

dt e t f C

2

) (

π

∫ ∫ ∫ ∑

⋅ = ⋅ = = ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ ⋅ + ⋅ + =

∞ = T n T n T n n n

dt T nt t f T b dt T nt t f T a dt t f T A T nt b T nt a A t f

1

2 sen ) ( 2 2 cos ) ( 2 ) ( 1 2 sen 2 cos ) ( π π π π

Feature Extraction

122

The shape contour can be described in polar coordinates: The contour can be described as a function of the tangent angle: Defining the function of accumulation of the tangent angle: This function is normalized in the range [0, 2π] Finally, this function can be descomposed in Fourier descriptors in the following way:

Transform-based features: Fourier Descriptors

) ( ) ( ) ( s iy s x s z + =

[ ] [ ] α

α θ α α θ d y s y d x s x

s s

) ( sin ) ( ) ( ) ( cos ) ( ) (

∫ ∫

+ = + =

∞ =

+ + = Φ

1 *

) sin cos ( ) (

k k k

kt b kt a a t ) ( ) ( ) ( θ θ − = Φ l l

t Lt t + ⎟ ⎠ ⎞ ⎜ ⎝ ⎛ Φ = Φ π 2 ) (

*

Feature Extraction

  • C. T. Zhan, R. Z. Roskies: Fourier descriptors for plane closed curves. Trans. on computers, vol. C-21,

nº 3, pp. 269-281,1972

slide-62
SLIDE 62

62

123

Transform-based features: Fourier Descriptors

  • In the discrete case:
  • Then:
  • Fourier descriptors depend on the starting point
  • For 64x64 images, it has been shown that 5 coefficients are enough to

discriminate between “2” and “Z”

= − − −

∆Φ = Φ ⎥ ⎥ ⎦ ⎤ ⎢ ⎢ ⎣ ⎡ − − = ∆Φ + =

j k j j j j j j j j j j

s s s x s x s y s y s s iy s x s z

1 1 1 1

) ( ) ( ) ( ) ( ) ( ) ( tan ) ( ) ( ) ( ) (

∑ ∑ ∑

= = =

∆Φ = ∆Φ − = ∆Φ − − =

m k k n m k k n m k k

L nl k n b L nl k n a k l L a

1 1 1

2 cos ) ( 1 2 sin ) ( 1 ) ( 1 π π π π π

Feature Extraction

124

Transform-based features: Fourier Descriptors

. when ) ( ) ( ˆ ), ( ) ( ˆ and length contour the is where 2 sin 2 cos ) ( ˆ 2 sin 2 cos ) ( ˆ

1 1

∞ → ≡ ≡ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ + + =

∑ ∑

= =

N t y t y t x t x T T t n d T t n c C t y T t n b T t n a A t x

N n n n N n n n

π π π π

[ ] [ ] [ ] [ ]

pixels contour

  • f

number , , , , , / 2

  • n

sin sin 2 cos cos 2 sin sin 2 cos cos 2

1 1 2 2 1 1 1 1 2 2 1 1 2 2 1 1 2 2 1 1 2 2

m t t T t t y x t y y y x x x T t n t y n T d t y n T c t x n T b t x n T a

m j j m i j j i i i i i i i i i i i i m i i i n i i m i i i n i i m i i i n i i m i i i n

∑ ∑ ∑ ∑ ∑ ∑

= = − − − = − = − = − =

∆ = = ∆ = ∆ + ∆ = ∆ − = ∆ − = ∆ = − ∆ ∆ = − ∆ ∆ = − ∆ ∆ = − ∆ ∆ = π φ φ φ π φ φ π φ φ π φ φ π

Invariance to starting point

The phase shift with respecte the main axis is computed and the coefficients are rotated according to this angle

2 1 2 1 2 1 2 1 1 1 1 1 1

) ( 2 tan 2 1 d c b a d c b a − + − + =

θ

In the discrete case

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − ⋅ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡

1 1 1 1 * * * *

cos sen sen cos θ θ θ θ n n n n d c b a d c b a

n n n n n n n n

Feature Extraction

Elliptic Fourier Descriptors

F.P. Kuhl, C.R. Giardina: Elliptic fourier features of a closed contour. Comput. Vis. Graphics Image Process, vol. 18, pp. 236-258, 1982

slide-63
SLIDE 63

63

125

Transform-based features: Fourier Descriptors

Character 5 reconstructed using elliptic Fourier descriptors of

  • rder 1, 2, ..., 10; 15, 20, 30, 40, 50 and 100 respectively

Rotation invariance

The orientation of the main semi-axis is computed and the coefficients rotated

* 1 * 1 1

arctg a c = ψ

⎥ ⎦ ⎤ ⎢ ⎣ ⎡ ⋅ ⎥ ⎦ ⎤ ⎢ ⎣ ⎡ − = ⎥ ⎦ ⎤ ⎢ ⎣ ⎡

* * * * 1 1 1 1 * * * * * * * *

cos sen sen cos

n n n n n n n n

d c b a d c b a ψ ψ ψ ψ Scale invarianve

Coefficients are divided by the magnitude of the main semi-axis

* * 1 * 1 * 1

2 2

a c a E = + =

Feature Extraction

126

Transform-based features: Fourier Descriptors

  • Experiments with handwritten digits (100 images per digit)

– 12 elliptic descriptors: 99.7% – 12 non-elliptic descriptors: 99.5%

  • Experiments with digits + lowercase letters:

– 12 elliptic descriptors: 98.6% – 12 non elliptics descriptors: 90.1%

Feature Extraction

Evaluation

  • T. Taxt, J.B. Olafsdottir, M. Daehlen: Recognition of Handwritten Symbols, Pattern Recognition, Vol.

23, n. 11, pp. 1155-1166, 1990

slide-64
SLIDE 64

64

127

Structural Analysis

  • Contour analysis
  • Skeleton analysis
  • Analysis of topological and geometric features

Methods based on the analysis of the character structure from the detection of some features and their relationships (the basic idea is to divide the character into its basic parts)

Feature Extraction

128

2, 4, 5, 1, 4, 3, 1

Structural Analysis: Run-length encoding

  • It is the simplest structural representation
  • run-length encoding represents each image row as a sequence of

pairs (l,g), where each pair represents a sequence of l consecutive píxels with gray level g.

  • For binary images, only the sequence length is required

0 0 4 4 4 2 2 2 2 2 4 4 4 1 1 1 1 1 1 1 1 0 0 0 0

(2,0), (3,4), (5,2), (3,4), (8,1), (4,0)

Feature Extraction

slide-65
SLIDE 65

65

129

Structural Analysis: Run-length encoding

A graph is built on the run-length encoding, where: Nodes: run-lengths. Edges: overlapping between runs in consecutive rows.

4,7 3,3 8,8 8,8 4,8 3,3 4,7 8,8 9,9 3,3 8,8 3,3 8,8 7,7 6,6

Region 1 Region 2

Feature Extraction

130

Structural Analysis: Chain-code

Chain-codes or Freeman codes are the simplest angular approximation. They permit to code each vector di between two consecutive points with a code between 0 and 7.

6 2 1 3 4 5 7

The codification of a string S is composed of 3 fields:

  • Starting point of the segment po(S)=(xo,yo).
  • Segment length l(S).
  • A table of directions:

Dir(S) = [d0, d1, ... , dl(S)-1] where di ∈[0,7], ∀i∈[0,l(S)-1], according to the following codification::

di-1 di di-1 di 1 di-1 di 2 di-1 di 3 di-1 di 4 di-1 di 5 di-1 di 6 di-1 di 7

Feature Extraction

slide-66
SLIDE 66

66

131

Structural Analysis: Contour Analysis

  • Contour pixels are codified accoriding to their orientatios and

using the chain-code representation

  • Classification can be done with methods of structural

recogntion based on the distance string-edit algorithm by Wagner i Fischer

2222122020 1000070070

Chain Codes 6 2 1 3 4 5 7

Feature Extraction

132

Structural Analysis: Skeleton Analysis

  • Image thinning and representation of the skeleton with some encoding

method allowing to compare two shapes (sometimes it is necessary to vectorize the skeleton):

– Chain codes – Grafs – Gramàtiques – Zoning – Característiques discretes.

  • Skeleton problems:

– Noise sensitive – Variability of the representation

B

esquelet

Feature Extraction

slide-67
SLIDE 67

67

133

Structural Analysis: Skeleton Analysis

  • Representation with graphs or grammars

– Based on the detection of characteristic skeleton points and skeleton polygonal approximation – Two possibilities to represent the skeleton with a graph:

  • Nodes are the characteristic points while edges are the segments joining the

points

  • Nodes are the segments of polygonal approximation while edges represent

the adjancency relations between the segments

v1 v2 v3 v4 v5 v6 a1 a4 a7 a2 a3 a5 a6 v1 v2 v3 v4 v5 v6 v7 a1 a2 a3 a4 a5 a6 a7

Feature Extraction

134

Structural Analysis: Skeleton Analysis

Representation with graphs:

Feature Extraction

  • J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9,
  • pp. 903-909, 1995
slide-68
SLIDE 68

68

135

Structural Analysis: Skeleton Analysis

  • Zoning.
  • Discrete features:

– Number of loops – Number of T joints and X joints – Number of terminal points, corner points and isolated points – Cross points with horizontal and vertical axes

A B C G H I D E F

Opció 1: stroke length within each zone Opció 2: coding from the arcs: ArC, ArD, CcF, DrF, DrG, FcI, GrI

  • n r = recta, c = corba.

Feature Extraction

136

Structural Analysis : Topological and geometric features

  • Aspect ratio x-y.
  • Perimeter, area, center of gravity
  • Minimal and maximal distance of the contour to the center of gravity
  • Number of holes
  • Euler number = (n. of connected components) - (n. of holes)
  • Compacity = (perimeter)2 / (4π ·area)
  • Information about contour curvature
  • Ascenders and descenders
  • Concavities and holes
  • Loops
  • Unions, terminal points, crossings with horizontal and vertical axes
  • Angular information: histogram of segment angles

Feature Extraction

slide-69
SLIDE 69

69

137

Components of an OCR system

ACQUISITION SEGMENTATION CHARACTER PRE-PROCESSING FEATURE EXTRACTION CLASSIFICATION POSTPROCESSING

  • Layout analysis
  • Text/graphics separation
  • Character segmentation
  • Filtering
  • Normalization
  • Image-based features
  • Statistical features
  • Transform-based features
  • Structural features

Context infromation LEARNING

Models

DOCUMENT PRE-PROCESSING

  • Filtering.
  • Binarization.
  • Skew correction.

Classification

138

Classification

  • Different methods, depending on the model of feature

representation

  • Classification using feature vectors

– Correlation – Euclidean distance – Mahalanobis distance – k nearest neigbours – Bayes’ classifier – Neural networks

  • Classification with structural features

– Dicothomic search – String edit – Graph matching – Grammars

Classification

slide-70
SLIDE 70

70

139

Classification using feature vectors

  • Correlation

– Classification with the class with the largest correlation value

  • Minimal euclidean distance

– Distance to the mean mi of the class – It does not take into account differences in variance for each class

) ( ) ( ) (

i T i i

m x m x x D − − =

∫∫ ∫∫ ∫∫

=

R i R R i i

y x g y x f dxdy y x g y x f f S

2 2

) , ( ) , ( ) , ( ) , ( ) (

Classification

140

Classification using feature vectors

  • Minimal quadratic distance (Mahalanobis).

– For each class i, the mean mi and covariance matrix Si are computed from the set of samples – The covariance matrix is taken into account when computing the distance from an image to the class i – The feature vector of the image x is projected over the eigenvectors of the class

i i i i i T i i T i i T i i

S S m x z z z m x S m x x D de propis vectors : de propis valors : ) ( ) ( ) ( ) (

2 / 1 1

Ψ Λ − Ψ Λ = − = − − =

− −

Classification

slide-71
SLIDE 71

71

141

Classification using feature vectors

  • k-Nearest Neighbors

– Several sample models or each class – Given an image, we take the k nearest models to the image – The image is classified into the class with more elemetns in the set of k nearest neighbors

  • Weighted several nearest neighbors

– k depends on the image. For each image x, the set Vx contains all the models with a distance lower than α times the distance to the nearest model i V V x k V V x D

x i x x i x i

classe la a pertanyen que i de dins estan que models : a propers més models dels conjunt : ) (

) ( ) (

=

( )

=

) (

) ( 2 ) (

, ) (

i x

V j i j i x i

x x d V x D

Classification

142

Classification using feature vectors

  • Bayes’ classifier

– An image x is classified into the class i that maximize the posterior probability p(wi|x) – Applying Bayes’ theorem: – p(x) is constant and independent of class i. Therefore, it has no influence in classification – If all classes have the same prior probability, p(wi) we can discard it

  • too. Then:

) ( ) ( ) | ( ) | ( x p w p w x p x w p

i i i

=

y probabilit Posterior . class to belongs y that probabilit : ) | ( . class to belongs it that given image vector

  • bseving
  • f

y Probabilit . Likelihood : ) | ( image vector

  • bserving
  • f

y probabilit : ) ( i x x w p i x w x p x x p

i i

) | ( arg ) | ( arg

i i i i

w x p max x w p max =

Classification

slide-72
SLIDE 72

72

143

Classification using feature vectors

  • If we assume a normal distribution for each class with mean mi and

covariance Si, estimated from the set of samples for each class:

  • Discarding constants, taking squares and applying the logarithm, we can

infer the following discriminant function:

) ( ) ( 2 1 2 / 1 2 /

1

) 2 ( ) | (

i i T i

m x S m x i n i

e S w x p

− − − − −

⋅ ⋅ = π ) ( ) ( log

1 ) ( i i T i i x i

m x S m x S D − − − − =

Classification

144

Classification using feature vectors

  • Neural networks

– They can be applied directly to the image or to a feature vector, previously computed from the image – The most used neural networks are multi-layer feed-forward networks

Y.L. Cun et al. Backpropagation applied to zip code recognition, Neural comput. 541-551 (1989). Each layer represents a feature subvector of a higher level

Classification

slide-73
SLIDE 73

73

145

Classification using feature vectors

  • A neural network is organized into several layers. Each layer has a fixed

number of nodes

  • In the first layer, the nodes correspond to the values in the feature vector
  • In the last layer, the nodes correspond to each of the classes
  • Intermediate (hidden) layers represent feature subvectors of a higher level
  • The value at each node in a given layer is computed through the

application of a propagation function to the values of the nodes in the previous layer, weighted by a vector of weights 1 layer node , layer node between connexion the

  • f

weight : w layer at bias node : layer

  • f

nodes

  • f

Nº : ) ( layer at node

  • f

value :

k ji ) 1 ( 1 1

− ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + =

− = −

k i k j k i b k k N k i x x w b x

k i k i k N i k i k ji k j k j

σ

Classification

146

Classification using feature vectors

  • To design a neural network, we have to decide:

– The number of layers – The number of nodes at each layer

  • Learning step, using a set of samples

– The weights at each connexion are automatically determined in such a way that minimize classification error

  • Example of neural network

– Three-layer perceptron (the input accounts for one layer). The final discriminant function for each class is: sigmoidea, ) 1 /( 1 ) ( ) (

) 1 ( 1 ) ( 1 ) 1 ( ) 1 ( ) 2 ( ) 2 ( x N j N k jk j ij i i

e x f x w b f w b f x D

− = =

+ = ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ ⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + =

∑ ∑

Classification

slide-74
SLIDE 74

74

147

Classification using structural features

  • Dicotomic search. The presence or absence of certain

primitives is tested in several steps. Character models can be

  • rganized with decision trees
  • String or graph matching. Application of algorithms for string

edit or matching in graphs of atributes

  • Grammars. We test if the character belongs to the language

generated by the grammar that represents the model

Classification

148

Classification using Deformable models

  • Deformable models

– We start with an ideal representation of the shape of the object – This ideal shape is deformed using a set of rules or pre-defined operation in such a way that all possible valid object distortions can be generated – Given an image, we look for the object deformation that yields the best result for an energy function defined as the combination of two measures

  • Internal energy: it measures the deformation degree from the model of the
  • bject
  • External energy: it measures the degree of similarity of the deformation to

the image

– The image is classified into the model with the lowest global energy

Classification

slide-75
SLIDE 75

75

149

Classification using Deformable models

  • Based on a character prototype:

– We define a prototype of the character that can be deforme by applying a series of trigonometric transforms over the image space – Basis of the transform: – External energy bassed on the distance of the deformation to the image contour – Bayesian combination of internal and external energy

)) sin( ) cos( 2 , ( ) ), cos( ) sin( 2 ( ny mx e my nx e

y mn x mn

π π π π = =

Classification

150

Classification using Deformable models

  • Based on a set of point generators located on the surface of a

spline:

– Internal energy:

  • The character is represented by a spline. We can modify the control points of the

spline

  • A probability is generated based on the modification of these control points

– External energy:

  • Image points are generated from generators located along the spline
  • A probability is defined based on the distance from image pixels to point

generators

– Minimization:

  • Probabilistic combination
  • EM algorithm

Classification

slide-76
SLIDE 76

76

151

Classification using Deformable models

  • Point distribution model:

– The model is represented as the mean of a set of points obtained from the skeleton of the learning samples – PCA is applied to obtain the set of valid character deformation from the learning set – For each image, we get the nearest deformation according to the space defined by PCA – Internal energy: – External energy: distance between the image and the image obtained using the nearest deformation

Pb x x + =

) ( x x P b

T x

− ≈

x

b

Classification

152

Holistic methods

  • Used for recognition of handwritten script
  • Each word is a recognition unit. We try to recognize each word

using global features of it. Each word is a different class

  • Based on psychological evidences
  • Applications:

– Constrained domains with only a few words (bank applications, personal agendas, etc) – Filtering of large domains to reduce the set of possible words

  • Usually, they are based on the application of HMM or dynamic

programming (string edit) to words (not to letters)

Classification

slide-77
SLIDE 77

77

153

Estratègies holístiques

  • Global word features:

– Distribution of segment orientations (horizontal, vertical, diagonal) – Terminal points – Concavities – Holes – Loops – Word length – Ascenders and Descenders. – Crossing point with the central line – Fourier coefficients

  • Feature representation:

– Feature vectors of matrices: the word is divided into zones. Each zone corresponds to a component of the feature vector where it is tested the presence or absence of the features – Graphs: adjacency or neighboring relations between detected features

Classification

154

Hidden Markov Models

  • Hidden Markov Model

– It represents a double stochastic process where a Markov chain is not directly observable. It can only be inferred from the observation of another stochastic process – In an HMM, a sequence of observations O=(o1,...,oT) is produced by a sequence of states Q=(q1,...qT) – An HMM is modelled by: { } { } { }

{ } { } { }

state given a in symbol a

  • bserving
  • f

y probabilit : ) | ( ) ( b , ) ( b B state final the

  • f

y probabilit : ) ( , states between y probabilit n transitio : ) | ( a , a A state each for y probabilit Initial : ) ( , states

  • f

set : 1 ; S symbols

  • bservable
  • f

set : 1 ;

j j 1 ij ij 1 i i

j q v

  • P

k k s q P s q s q P s q P N i s M k v V

t k t j T j j i t j t i i k

= = = = = = = Γ = = = = = = = Π ≤ ≤ = ≤ ≤ =

+

γ γ π π

Hidden Markov Models

slide-78
SLIDE 78

78

155

Hidden Markov Models

Example: Markov Model to model the weather

States: S = {rainy, sunny} Set of observations (possible values of humidity) V = {0%, 25%, 50%, 75%, 100%} Sequence of observations: O = {0%,25%,25%,50%,25%,75%,50%}

sunny rainy P(s|s) P(s|p) P(p|s) P(p|p) P(0%|s) P(25%|s) P(50%|s) P(75%|s) P(100%|s) P(0%|p) P(25%|p) P(50%|p) P(75%|p) P(100%|p)

With the model we can:

  • Compute the probability that the sequence of observations can be generated

with the model of weather

  • Given a set of observations of humidity, to estimate the model of weather
  • Decide if the weather has been rainy or sunny for each of the days under
  • bservation

Hidden Markov Models

156

Hidden Markov Models

  • Three problems related to a HMM:

– Given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find P(O| λ), the probability that the sequence of observations can be generated from the model

  • Computed by summing the probabilities of the sequence of observations for all

possible sequences of states

  • Forward/backward propagation methods

– Learning problem: given a learning set, O, find the parameters of the model λ that maximize P(O| λ).

  • Baum-Welch algorithm

– Recognition problem: given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find the optimal sequence of states Q

  • Viterbi algorithm

Hidden Markov Models

slide-79
SLIDE 79

79

157

Hidden Markov Models. OCR applications

  • 1. External segmentation with context post-process:
  • The Markov model represents grapheme variations. Based on bigram

frequencies or trigram frequencies, probabilities of finding sequences of consecutives letters in the dictionary of the language

  • 2. Internal segmentation:
  • The Markov chain represents character features extracted from left to right

along the text. These features are compared with the model and, with the recognition, it can be decided when a character ends and the next one begins

  • 3. Holistic methods:
  • The Markov chain represents variations within a given word, belonging to

a lexicon with the valid words

Hidden Markov Models

158

Hidden Markov Models. OCR applications

  • One state per letter
  • Transition probabilities are the probabilities of finding two consecutive letters

in the language

  • The observations are the pre-segmented zones of the image
  • The probability of the observations given a state are the probabilities that each

pre-segmented zone corresponds to the letter associated to the state

a m n ...

pam pmm pma pnm pmn paa pnn

Hidden Markov Models

External segmentation with context post-process

slide-80
SLIDE 80

80

159

Hidden Markov Models. OCR applications

  • Model Discriminant HMM:

– An HMM for each letter – Recognition is done letter by letter – States within an HMM represent zones within the letter with different feature values – Word recognition is done by finding the best individual HMM combination according to all segmentation possibilities

Hidden Markov Models

Internal segmentation

160

Hidden Markov Models. OCR applications

  • Path-disciminant HMM

– One HMM – Each state corresponds to a letter – Transitions between states correspond to the probability of changing from

  • ne letter to another within a word

– Over segmentation of the image. Each observation corresponds to a set of features extracted from each segment – The probability of each observation for a given state is the probability of generating the set of features from a given letter – Recognition looks for the sequence of states (letters) that best generates the set of features extracted from the image

a b c ...

Hidden Markov Models

Internal segmentation

slide-81
SLIDE 81

81

161

Hidden Markov Models. OCR applications

  • Example: HMM for word recognition

– One HMM for each word – Each letter is represented with four states. Each state is a possible result of a previous oversegmentation – Recognition looks for the HMM (word) that maximizes the probability of generating the sequence of observations (features extracted from the image)

c a ...

Hidden Markov Models

Holistic methods

162

Hidden Markov Models. OCR applications

  • Feature extraction:

– Computation of 9 features inside a mobile window:

  • Number of pixels
  • Center of gravity
  • Moments of order 2
  • Location of upper and lower contour
  • Orientation of upper and lower contour
  • Number of black-white transitions in the vertical direction
  • Number of black pixels between upper and lower contours
  • One HMM for each character
  • HMM are concatenated to compose words. Find the combination
  • f HMM with the highest probability
  • Results:

– Vocabulary: 2296 words – Test set: 3850 words, 80 people – Recognition: 82.05%

  • S. Gunter, H. Bunke: HMM-based handwritten word recognition: on the optimization of the number of

states, training iterations and Gaussian components. Pattern Recognition, vol. 37, pp. 2069-2070, 2004.

slide-82
SLIDE 82

82

163

Components of an OCR system

ACQUISITION SEGMENTATION CHARACTER PRE-PROCESSING FEATURE EXTRACTION CLASSIFICATION POSTPROCESSING

  • Layout analysis
  • Text/graphics separation
  • Character segmentation
  • Filtering
  • Normalization
  • Image-based features
  • Statistical features
  • Transform-based features
  • Structural features

Context infromation LEARNING

Models

DOCUMENT PRE-PROCESSING

  • Filtering.
  • Binarization.
  • Skew correction.

Post-process

164

Post-process

  • Post-process to improve OCR results:

– Voting. Combination of several classifiers – Utilization of context information. Analysis of classification of individual characters in the context of adjacent characters

Post-process

slide-83
SLIDE 83

83

165

Post-process. Classifier Combination

Combination of specific classifiers with good performance for some characters, fonts, etc.

Text Classifier 1 Classifier 2 Classifier k ... Classifiers Combination (voting) Output of individual classification Final

  • utput

Knowledge

  • Integrity: how the voting algorithm controls

the activation and configuration of individual

  • classifiers. High integrity => The voting

algorithm decides the best classifiers in each situation

  • Representation of the classification results:
  • Abstract: each classifier simply gives

the label of the class.

  • Ranked: each classifier gives several

ranked labels.

  • Ranked with a degree of confidence:

for each label, the classifier gives a level

  • f confidence.
  • Combination of the classifier results. How to

combine the results

Post-process

166

Post-process. Classifier Combination

  • Classif. e

C17=‘s’, c21=‘S’, c4=‘g’, c9=‘8’

x = ?

Post-process

slide-84
SLIDE 84

84

167

Post-process. Context information

  • Context analysis tries to correct errors produced by decisions

taken in function of local features

  • In the presence of uncertainty, hypothesis generated by the local

classifier are complemented with hypothesis of neighboring characters

  • Two points of view

– Geometric conext (typographic). – Linguistic context

13 o B?

Post-process

168

Post-process. Context information

  • Methods based on n-grams (Combinations of n letters).

– Probability that an n-gram appears in the words of the dictionary – When there are characters with uncertainty, we take the final decision according to the n-gram with the highest probability – bottom-up techniques – Viterbi algorithm and Markov methods

  • Methods based on grammars

– A grammar is used to validate the results of the OCR – Similar to n-grams, but they permit to consider variable-length strings and recursivity

vint - cents vuit - cents Xifra → Desena ‘-’ Unitat | Unitat ‘-’ Centena D - C U - C

?

vuit - cents

Post-process

slide-85
SLIDE 85

85

169

Post-process. Context information

  • Methods based on a dictionary

– Creation of a dictionary with the set of correct words – It permits ortographic correction of the text – Utilization of string-edit algorithms – Problem: words not included in the dictionary – Requires structures for representing the dictionary providing a quick access

Autòmat per al diccionari: {LLIBRE, LLIURE, CAURE, COURE, COST}

L L I B R E C A U O S T

p a r a u l a

h

paraula 2 1 n h(k) clau Hash function dictionary

Post-process

170

Examples of OCR systems

  • Printed characters:

– Readstar (Innovatic). – Cuneiform (Cognitive Technology). – Word Scan (Calera). – OmniPage (Caere). – Text Bridge (Xerox). – Neuro Talker OCR (Int. Neural Machines). – OCR Master. – Recognita Plus. – TypeReader Professional. – Etc.

  • Hand-written characters:

– Mitek.

Examples

slide-86
SLIDE 86

86

171

Examples of OCR systems

  • Inspection of surgical sachets

– Digit recognition: reference and date – System requirements

  • Irregular surface: shadows and reflections
  • Resolution: 175dpi
  • Aqcuisition with a camera B/W
  • Diffuse lighting
  • Detection of 16 defects in 400ms.
  • Verification system

Examples

172

Examples of OCR systems

  • Preprocess

– Skew correction: combination of the angle of the upper contour and the segments of the box surrounding the word LOT. – Binarization: otsu – Thinning.

Examples

slide-87
SLIDE 87

87

173

Examples of OCR systems

  • Segmentation

– Connected components from the skeleton – Application of domain knowledge (character size and separation) to segments touching or broken characters:

  • Divide wide components
  • Join thin and nearby components

Examples

174

Examples of OCR systems

  • Feature extraction: zoning

– The size of each zone is not constant: adapted to image size – Two versions:

  • Value at each zone: measure of the importance of the zone with respect to the

whole character:

– Number of pixels greater than a percentage of the total number in the image 1. Otherwise 0 – The central region is more important. Value multiplied by 2. – Three values are added to combine the values in the more discriminant zones

  • Value at each zone: percentage of white pixels at each zone

0.0 0.5 2.0 1 1 1 1 1 2 1 1 1 1 1 1

Examples

slide-88
SLIDE 88

88

175

Examples of OCR systems

  • Classification

– Version 1

  • Model with the minimal distance
  • If several models have the same minimal distance and the digit to verify is

among the candidates, it is verified with an ambigous signal

– Versió 2:

  • Mahalanobis distance
  • Learning step to compute mean and covariance for each digit
  • Verification of the digit string:

– The string is rejected if there are more than a digit mis-recognized or more than two ambigous digits

=

− =

n j j j

m i d

1

Examples

176

Bibliography

  • S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999.
  • H. Bunke, P.S.P. Wang. Handbook of Character Recognition and Document Image Analysis. World

Scientific Publishing Company, 1997.

  • S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier.

Kluwer Academic Publishers. 1999.

  • S. Impedovo. Fundamentals in Handwiting Recognition. Springer-Verlag, 1994.
  • A. Belaïd, Y. Belaïd. Reconnaissance des formes. Méthodes et Applications. Inter Editions, Paris,

1992.

  • A.C. Downtown, S. Impedovo. Progress in Handwriting Recognition. World Scientific Publishing

Company, 1997.

  • P.S.P. Wang. Character and Handwriting Recognition: Expanding Frontiers. Special Issue of IJPRAI,
  • Vol. 5 nums. 1,2, 1991.
  • T. Pavlidis, S. Mori. Optical Character Recognition. Special Issue of Proceedings of the IEEE, Vol.

80 no. 7, 1992.

  • V.K. Govindan, A.P. Shivaprasad. Character Recognition - A Review. Pattern Recognition, Vol. 23,
  • No. 7, pp. 671-683, 1990.

Bibliography

slide-89
SLIDE 89

89

177

Bibliography

  • O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey.

Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996.

  • R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE

Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996.

  • Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n.

1, pp. 67-80, 1995.

  • Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. Pattern

Recognition, Vol. 29, n. 1, pp. 77-96, 1996.

  • S.W. Lee. Advnces in Handwriting Recognition. World Scientific. 1999.
  • C.H. Chen, L.F. Pau, P.S.P. Wang. Handbook of Pattern Recognition and Computer Vision. World
  • Scientific. 1993.
  • J.L. Blue et al. Evaluation of Pattern Classifiers for Fingerprint and OCR Applications. Pattern

Recognition, Vol. 27, no. 4, pp. 485-501, 1994.

  • R. Palmondon, S. Srihari. On-line and Off-line Handwriting Recognition. A Comprehensive
  • Survey. IEEE Transactions on PAMI. Vol. 22, no. 1, pp. 63-84, 2000.
  • G. Nagy. Twenty Years of Document Image Analysis in PAMI. IEEE Transactions on PAMI, Vol.

22, no. 1, pp. 38-62, 2000.

  • H.Bunke, T. Caelli. Hidden Markov Models. Applications in Computer Vision. World Scientific.

2001.

  • R. Duda, P. Hart, D. Stork. Pattern Classification. 2ª ed., Wiley Interscience, 2000

Bibliography

178

Practical work

Binarization (Local adaptative) Document image Binary image Layout Analysis N binary text images Character Segmentation N character images Feature Extraction Feature Vector Classification (Multiple classifiers) Character Label

Groups of 2 people for each task