[PDF] - Handprinted Character/Digit Recognition using a Multiple PDF Document

SLIDE 1

Handprinted Character/Digit Recognition using a Multiple Feature/Resolution Phitosophy

J.T. FavatA G Srikantan, and S.

N. Srihari

CEDAR State University of New York at Buffdo, USA Abstract tb prpcr outli4es the philosophy, desrgn and implementation

f the Gradient,

Structural, bvily (GSC) recognition algorithm, which has been used successfully in several aGG"d rcading applications at CEDAR. The GSC algorithm takes a quasi multi- dio approach to feature generation. This philosophy coupled with the appropriaie *tf,aatim function results in a recognizer which has both high accuracy and good

'et-cc
behavior. This allows it to be used in higher level digit string and word

ggliti@ algorithms which search for digit/character boundaries. Tests of the GSC Ner

D

standard digit, character and non-character databases are reported. L htroduction lfuy different approaches have been used by researchers to solve the problem

f machine

qn end character recognition [Suen92]. These approaches have included investigations

f:

t&re seb [Srik93] Man86], classifier algorithms, multiple combinations

f classifrers

tltogSl and novel statistical methods [Klein93]. There can be much overlap between Cftreot methods and a precise taxonomy can prove difficutt. It would be safe to say that thc pecise classification of an algorithm has much to do with the perspective

f the

Lrwstigator during the design

f the algorithm.

lvtany different algorithms have been explored by the researchers at CEDAR

tlee93l. These

algorithms have encompassed a wide range

f feature

and classifier t1pes. Brcry algorithm has characteristics, such as high speed, high accuracy, good thresholding rbility, and generalization, which are useful for specific applications. Examplas

f classifiers

&veloped at CEDAR are listed in Table 1. This paper will outline the philosophical and Fcticsl deta'ls of one these classifiers: the Gradient, structural, concavity (GSC) cl'sifier. 2" Philosophy Tbe approach used in designing the GSC was based

n the observation

that feature sets can be designed to extact certain tlpes of information from the image. Feature detectors can be built to detect the local, intermediate and global feaares of an image. The basic unit of an image is the pixel and we are interested in both is location (x,y coordinate) and the relationship

f the pixel to its neighbors

at different ranges from locally and

globally. This

57

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 2

can be expressed by saying we want to determine the relationship

f each

pixel to evtry

ther pixel at increasing

distances. In a sense, we are taking a multi-resolution approacb ro feature generation. The GSC features approximate a multi-resolution approach by beiq generated at three ranges: local, intermediate and global. The gradient feaares d€tect local features

f the image

and provide a great desl d information about stoke shape at a short distsnce. The structural features extend tbc gradient features to longer distances and grve certain useful information about strokc trajectories. The concavity features are used to detect certain stroke relationships at lmg distances which can span across the image. In practice, there are computationally impccd limits to how a particular philosophy can be implemented- In the GSC algorithm, certain decisions were made in the exact delection and representation

fthe features

to result io a practical algorithm. The exact implementation should not distract from the underlying philosophy. It should be emphasized that we are presenting

ne particular

implementation

f our philosophy

and that others are possible. The total feature vector lengttr is 5 12

bits. It

is important to note that the feature vector is binary. The GSC feature vector is very compact,

ther algorithms

may use a smaller number

f multi-valued

(or real) features but the effective number

f bits to represent

such feature vectors can actually be quite large.

3. Feature

Description The GSC algorithm was designed to work with binadzed images, so it is presumed that the image has been thresholded using a suitable algorithm [Otsu79]. The image is slant normalized using a moment based algorittrm to reduce the effecs of skew. A bounding box is placed around the image and the features are computed (see below). The feature maps are sampled by placing a 4x4 gird on the maps (see Figure 1). The features thembelves are computed independently

f this sampling

grid. Gradient Features The gradient features are computed by convolving two Sobel operators on the binary

image. These operators approximate

the x and y derivatives in the image. The vector addition of the operators'

utput is used to compule the gradient of the image.

Since the gradient is a vector with magnitude and direction, only the direction is used in the computation

f the feature
vector. The direction of the gradient

can range from 0 to 359 degrees. This range is split ino 12 non-overlapping regions of 360112 degrees. In each sampling region (4x4 grd), a histogram is taken of each gradient direction at each pixel which lies in the region. A tbreshold is applied to the histogram and the feature bit is set for each feature count that exceeds a threshold. This subset

f the GSC features

produces 12*4*4=l!2 bits of the total featue vector.

58

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 3

grid siz€

f 5x5. Other stoke features

detected are four B?es of corners which consist d perpendicular co-occurrences

f strokes.

These features contribute 4x4xL2=192 bis to tic total featue vector . Concaviry Feanres These features, which are the coarsest

f the GSC set"

can be broken down into tbree subclasses

f feahres. The total contribution of these

features are 4x4x8=128 bits. Subclass A Coarse Pixel Density Features These feafires capture the general groupings

f pixels

in the image. They are computed by placing the 4x4 sampling grid on the image and counting the number

f image

pixels that fatl into each

grid. Thresholding

converts these area counts into a single bit for each regioo. This feature conEibutes 4x4=t6 bits of the feature vector. Subclass

B. l,arge Stroke

Features These features attempt to capture large horizontal and vertical strokes in the image. Run lengths

f horizontal

and vertical black pixels across the image are first computed. From this information, the presence

f strokes are determined

by testing for sffoke lengths above a threshold. This feature contributes 4x4*2=J) bits ofthe feature vector. Subclass

C. U lDlLlRltl Concavity

Features These features are computed by convolving the image with a star-Iike operator. This

perator

shoots rays in eight directions and determines what each ray hits. A ray can hit an image pixel or the edge

f the image.

A able is built for the termination status

f the rays

emitted from each white pixel of the image. A computationally efficient algorithm simihr to runJength encoding is actually used to compute the star operator. The class

f the each

pixel is determined by applying rules to the termination status pattems of the pixel. Currently upward/downward, left/right pointrng concavities are detected along with holes. The rules are relaxed to dlow nearly enclosing holes (broken holes) to be detected as holes. This gives a bit more robustness to noisy imqges. These features can overlap, in that, in certain cases more than one feature can be detected at a oixel location. These features confibute 4x4x5=80 bits.

4. Classification

The classification problem can be stated as finding functions which map featue vectors to classes. Ideally these functions should map all valid regions

f the feature

space to the class space. With high dimensional feature spaces, this can be a very difficult problem. There may never be enough faining data to adequately estimate these functions in certain regions of

60

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 4

fuucwai Feanres tbc structural features capture certain patterns embedded in the gradient map. Tbese P@rDs are "microstrokes"

f the

image. several 3x3 operaors are passed

ver the

Teble 1. CEDAR Classifiers

lihmc Performaace (digts) Speed (digits/sec) Image

96.02Vo 66.7 Biryoly

96.43Vo

25.0

Cbaincode 98.33Vo

67.0

Gabor* 97.70Vo

6.7

Gradient 98.46Vo

5.3

Histogram* 97.47Vo

62.5

Morphology* 97.92Vo

0.5

GSC 98.87Vo

10.0

Notes: 1. (*) indicates classifiers which are no lotrg"r ""tiuAy Ueiogffi@

2. Performance

and speed figures are approximate and used for relative comparison gradient map to locate small strokes pointing up/down ard dirgonally. These strokes are cmbined into a larger features using a rule table. The largest rp* ofthe feature covers.-a

E

ilJ

t l t l

ll

t l LI

HJ

t-tr bt_l

tfZ

l"1l

L/

D h M r MEe

$+P(

't+ ".rS "'#'::

e

bX

qg/

"w

FM l1-ffl nrn

.) 512 Cr r 51?Crfu

trigure 1. GSC feafures 59

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 5

t ft&rc space. In addition, we can also assume that our features may not have enough dynrg power to distinguish among certain csses

f images
r there

may be an aliasing FUco where two different image patterns may map into tbe (nearly) same feature v@tor. tB cra sometimes be due to practical implementation compromises involved in the {ri6n design such as the need for speed

r machine

memory size limitations. Shce we * dc-ling with handwritten images, it is also possible Orat certsin cases

f dig,b or

&rars may be so borderline or poorly written that even humans could misrecognize b- Io computing the classification functions, we would like to generate functions that earcly reflect the training set and generalize well to the testing sets. This means that the firctioos should accurately recognize members

f the taining set and smoothly

rolloff as tb ftaune vectors move away from the labeled vectors

f the raining set.

That is, as the dcrtying image is smoothly taasformed into another image

f a different

class, we would & e soooth transition of the classification

function. This property is useful to prevent

rpnrious responses in those regions

fthe feature

space which are inadequately represented h 6c raining set. In addition, we would like good behavior in invalid regions

f the space.

Tbb last property is important if we use the recognizer to distinguish between valid and inrlid images such as using the system to find valid digts from a segmented. This work has foqscd on several different classifiers each with various tradeoffs:

f-j!}I

Thc k-nearest neighbor ft-nn) approach attempts to compute a classification ftnction by x;ng the labeled taining points as nodes or anchor points in the n (n=512) dimensisnal

rprcc. In a sense,

this is the most detailed description

f the space

that is possible from the raining samples. Rather than using a l-nearest neighbor classifier, we chose a k-nn clr"sifier to reduce the effect of mislabeled training data and to get a better estimate

f tbE

prdot)?e density st any particular point in the space. By choosing an appropriate distance metric a smooth rolloffin response can be obtained as a feature vector is moved away from e cluster center. Since the feature vector is binary, the comparisoo between unknown and hbcled vectors involves bit operations which can be done in parallel at the machines word

bogth. The main disadvantages
f k-nn is the memory

required to hold the raining data rnd the speed in classification. A clustering technique has been developed to greatly reduce the number

f comparisons

decessary to implement this classifier and it is now comparable io speed to other classi.fiers. Neural Nets Classification functions based on feed-forward neural network architecture have been explored at CEDAR with some suc@ss. It is our experience to date that these functions do not perform well with the GSC feature set, especidly with characters. The problems that

ccur are gpically poor accuracy

and ill-beheved response to invalid (non-character/digit)

61

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 6

lest images' we are cunenfl investigating the reasons for this unsatisfactory behavic. some of the problems may have to d; wi6 the limited raining date available for certain character classes' ft I oryt*l ru" ror r""ogoition of cursive characters in which there are no standard taining and testing daabaseslvailable yet. w" ,," *p.rir"enting with tzzr .rarnls techniques. This alproach us"s tne *nna*.o *,pi, aoln anothcr recognizer algorithm as the target vaiues

f the neural

*t auriog r"ir*r!. Earry tests have shown performance improvements in the neural net. Polynomial Functions This set of classification are computed as low order polynomiar functions of the feafure coefficients (bits; gsint " rnti*u;;;-squares error criterion. In generar, these classifiers perform poorly compared to k-nn, however, they can be reasonably well behaved with respect to invatid images, gott _" riryrr ani luaoratic nrnction classifier have been used with success in a word recognition algorithm ff"ig:f.

5. Refinements

A number

f refinemenb of both leature

generation and classification have been fied with various results' The most successful i,nprlu"tn"nt to the feature generation was to use a variable 4x4 sampling grid on the image. A iro.iro"t r and vgrticar histogram

f the image

is computed and sampling lines are ptaceo

n the equi-mass

divisi;;i-the-histogram. This results in higher sampling

f regions with the most mass.

A significant improvement in H:T.T* Tft Ael recognition has been obrained with this scheme. A number of

rnerent

matching metrics trubbggl have been tried with the k_nn arrrirr"r'g"d, mekic, or more appropriately, matching fi,rnction, blends the number

f features

that co_occurred with the n'mb€r of features ttrat aiA

ot
"tct o. *oJ oo,

pr"."or. 6. Applications Tbc Gsc crassifier * y:-":* 1u"3ril{ in severar different apprications. The first applicatiou is the segmenration

f zrp "J"r-

from digitiz€d p.ii"r?""ss images. T5i:IIy' a handwritte]r zTp c-odecontains I or g irol"t"d digi6 which usua'y can be read by r a;git recognizer. In some'cases, however, two or more of ttre adjacent digie can be tcEting in unpredictable

ways. one approach

for accurate recognition is to force a segmcotation

f the dieit string and r""ognir" eacl Qonenrriri;;il"d

;*, individuary. Tbc Gsc clqssifier is a-r<e1 ele,irent ."f ,td ";;.;;h. Another successtul apprication

f the

GSc dgai*n is in word recognition. a i"iiJ*irr"n word is " ,"qu"o"" of characrers rtce ibotities must be determined. rrtir c"o u" """o.prished by segmenting the word and dc."mdning character boundaries and identities. i", ry1"a words, segmentatron is usually crsy, hnfc crnsively written words it is *u"n ioi" am"Ut.

62

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 7

erycrineob with the GSC classifier have shown that it excels in these applications be tbc confidences are reliable enough to use in detecting the best character and digit ddd?s after segmentation. Figure 2 shows the polynomial and GSC algorithms tested lrsics

f images

containing noise, special marks and digit sequences. It can be seen that l Gsc alsqithsr offers reliable confidences and that nearly wery one of tbese images * be rcjected by picking one threshold value. br6n Oigir Sbing Segmentation Algorithm Oita tbc inage of a string of digits, the goal of the segmentation algorithm is to partition t ioags into regions, each containing an isolated digtt. A recognition aided iterative God is used. Adjacent digits can be touching and some

f the

digits might be broken into si rhrn one componeft (e.g. S-hats). Therefore the number

f digis in a digit sting is

u rinply e count of the number of components in the field. components which are as touching digits must be segmenled appropriately. The module which performs l cgmentation and subsequent recognition of the segmented digrts does the foUowing: Grco e connected component with 2 to 4 digits, the module estimates the number

f digil
$c component,

performs the appropriate segmentations and recognizes the individual

dgis. The

module currently performs ataTlyo correct ra!e. Tbc segmenter can be invoked in two different modes: (i) estimate the number

f digits and

(ii) force a glven digit string length. fls lrrmls1 of digits is initially estimated from the aspect ratio o1 gre .{igt string and cccssive estimates are obtained by a linear regression model. Digits that are recognized ?ith high confidence are removed after each

iteraiion. The effective contribution of the

reoorrcd digib to the density o1 gts digrt sfing is recorded. This information is fed to a bast squares linear model [Fen91]. By setting the density to zaro (all digrts are removed), 6c least squaxes equation can estimate the number

f digib in the sfing. Connected

digit coBpooents are split into required number of digits. The segmenter has a correct *gmentation rate of 92.9Vo when the string length is specifie4 and, is 87 .5Vo correct when 6e number

f digits has

to be estimated. General llandwrilten lilord Recognitian The GSC has been applied to general handwritten word recognition in which a word can contain any mixture of discrete, cursive

r touching

discrete characters. A recognition tcchnique called the hypothesis generation and reduction algorithm (HGR) IFavgal nrst segme_n6 the word image using a number of candidate segmentation

points. The Gsc

cles5ffis1 is then used to pinpoint the most likely charaster boundaries by extacting the

63

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 8

strokes from among pairs of segmentation points (see Figure 3). Later, a lexicon &ivea post-processor finds the most likely word using the GSC segment pair corfideuces md

ther

available contexl

byrcmh I O.75

I
- a t

a I O.7a I O . V , I O.AO

7 ? o.z1 7 7 d a 7 . o'52 V zo.E€ a e o . v ff 7o.€l

Yt ! o.55 G S C I O . 1 7

.s

t

.70

1 0.*

.43

7 1 . Q O 7 7 . O O 7 0 . e 5 7

-ao
.34
.43

4 0_65 tulFffil.l GSC 4 F .

4 3

€ O . . 3 JrE e

.aa

1 0.31 E , | 9 4

4

s a

€

a <

7

g . r c

.

3 9 t 7 . 4 1 . 6 9 : t 4

.

@ 3 . 2 6 ( 7 4

8

O . 7 6 * 2 4

.

5

4
.

* &, 9 0.65 a

.33

? ? 7

.

4 a e

z

a , A ? 7 a . 7 1 a o 3 €

Figure 2 - GSC tbresholding lest - comparison

f confidences

retumed by polynomial and GSC classifier

n non-characterc.

Output is arranged as ( input image, poly decision, poly conf, GSC decision, GSC conf ).

P

a, Cuaalvc acgment

4ut*

ry

L_,_

d E <traqlcd scgmcnls rnd GSC cfirractcr rcspongc

q

L-n- {r

L*

I

I t l "

Figure 3 - Finding characrcr boundaries

64

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 9

?. Erperimentat Results Tbc GSc classifiq has been wel characterized by extensive

testing. Figure 4 compares
rral

variations of this classifier on handwritten digits. The r";iing; consisted

f

24'm0 digi6 taken from various databases, the test set consisted

f 2l-aadigits. Figure 5

6rys the GSC classifier performance tained separately

n 60,000 upper and lower

c!'rr&ters chosen from the MST handprinted database. The test ,"t "o*irt"o

f 20,000

sFpcr and lower case characters.

GSC R.lrct va E.ror

I F F l0 to

F

a $

$ a t 0 I

figure 4 - GSC a;git results f

NIST Chrrrctcr Res!lt. i

a 5 a c r 3 0 4 2 5 I z o t 5 1 0

N I S T

U p p . r - o / C

-.---...NtST Upp.r. v.r OSC
- - . - .

N I S T ! o * . r . n .

Figure 5 - NIST upper/lower ct ar"cGierutts- 65

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.

SLIDE 10

Acknowledgment The authors would like to thank Dr. venu Govindaraju, Evie Kleinberg Ajay Shekhawar, Phit Kilinskas and

D. s. L,ee

fon helping in various ways during the development

f the Gsc

classifier.

8. References

[Fav93]

J. Favata,

"Recognition

f cursive, Discrele,

and Mixed Handwritten words using Character, I€xical and Spatial Qsns6aints", Tech Report 93-32, Dept of Computer Science, SUNY Butrafo, 1993 [Fen91]

R. Fenrich "Segmentation
f automatically

located handwritten words", Proc. Int. workshop in handwriting recognitiog Bonas, FRANCE, pp.33-44, L99l. tHo93] T.K. Ho, J. Hull, and s. srihari, "Decision combination in Multiple classifier Systems", IEEE P AM I, Y

l I 6, No. L,

pp 66-7 5, I an 199 4 [Klein93] E. Kleinberg and r.K. Ho, "Pattern Recognition by Stochastic Modeling" in Proceedings

f Third International Workshop on Frontiers in Handwriting Recognition

(IWFHR III), Buffalo, NY 1993 tl*eg3l D. Ire and S. Srihari "Handprinted Digit Recognition: A comparison of Algorithms" in Proceedings

f Third International

Workshop on Frontiers in Handwriting Recognition (IWFHR m), Buffalo, NY 1993 Man86] J. Mantas, "An overview of character Recognition Methodologies", pattent Recognition, Vol 19,

No. 6,

pp 425-430,1986 [otsu79] N. otsu, "A theshold selection mettrod from grey-level hi$tograms", IEEE Trans

n

SMC, Vol9, pp. 62-66,Ian. L979 lsrik93l G. Srikantan, "Gradient representation for handwritten character recognition" in Proceedings

f Third International Workshop on Frontiers in Handvriting Recognition

GWFHR ltr), Buffalo, NY 1993 [Suen92] C.Y. Suen et al, "Computer Recognition of Unconstained Handwritten Numerals", IEEE hoceedings, Vol 80, No 7, pp 1162-1180, JuIy 1992 [Tubb89] J.D. Tubbs, "A Note on Binary Template Matching", pattern Recognition, yol 22, No.4, pp.359-365, 1989

66

Favata, Srikantan and Srihari, Proc. IWFHR 1994, pp. 57-66.