AVotingSystemfor AVotingSystemfor AutomaticCorrectionofOCR - - PowerPoint PPT Presentation

a voting system for a voting system for automatic
SMART_READER_LITE
LIVE PREVIEW

AVotingSystemfor AVotingSystemfor AutomaticCorrectionofOCR - - PowerPoint PPT Presentation

AVotingSystemfor AVotingSystemfor AutomaticCorrectionofOCR AutomaticCorrectionofOCR Output Output


slide-1
SLIDE 1

AVotingSystemfor AVotingSystemfor AutomaticCorrectionofOCR AutomaticCorrectionofOCR Output Output

slide-2
SLIDE 2

Introduction Introduction

OCR=OpticalCharacterRecognition

slide-3
SLIDE 3
  • !
slide-4
SLIDE 4

KnownTechniquesfor KnownTechniquesfor SpellingCorrection SpellingCorrection

EditDistance: Theminimumnumberofeditingoperations (i.e.,insertion,deletionandsubstitutionof letters)requiredtotransformonestringto another.

slide-5
SLIDE 5

KnownTechniquesforSpellingCorrection KnownTechniquesforSpellingCorrection (cont...) (cont...) Hashing:

SkeletonKey

  • .

. cmnctouia communication . .

  • 1

2 1 3 4 5 6 1 7 8 1

slide-6
SLIDE 6

So,whatistheproblem? So,whatistheproblem?

  • Mostofthetechniquesarerelevantonly

fortypingerrors.

  • Designedforisolated

words.

  • Weareinterestedinafully

automaticsystem.

slide-7
SLIDE 7

TheAlgorithm TheAlgorithm

slide-8
SLIDE 8

TheAlgorithm(cont...)

slide-9
SLIDE 9

TheAlgorithm(cont...)

  • 1

j

w

2 j

w

slide-10
SLIDE 10

TheAlgorithm(cont...)

  • Ifwordsareidentical:

Accepttheword.

  • 1

j

w

2 j

w

  • 1

j

w

2 j

w

slide-11
SLIDE 11

TheAlgorithm(cont...)

Ifthewordsmismatch:

  • If onlyoneofthemisvalidThen:

Acceptthevalidone.

1 j

w

2 j

w

  • )

_ , ( 4 . ) _ , ( 6 . ) (

  • dictionary

global w freq dictionary local w freq w dictionary

i j i j i j

⋅ + ⋅ =

slide-12
SLIDE 12

TheAlgorithm(cont...)

  • 1

j

w

2 j

w

  • )

(

  • )

(

2 1 j j

w w Candidates

  • =
slide-13
SLIDE 13

TheAlgorithm(cont...)

  • =

=

  • therwise.
  • .

1 ) , (

  • if
  • 1

_

i j i j

k

w w nce edit_dista close is

)) , ( _

  • )

(

  • )

, , ( _ ( 0.4

  • ))

, , ( _ ( 6 . ) (

1 1 i j i j i j i j i j i j i i j i j i j

k k k k k

w w close is w dictionary w w w gram word OCR w w freq error w mark + + ⋅ + ⋅ =

+ −

slide-14
SLIDE 14

If bothwordsarevalid: TheAlgorithm(cont...)

). , , ( _

  • )

_ , ( ) (

1 1 i j i j i j i j i j

w w w gram word dictionary local w freq w context

+ −

+ =

)] _ , ( 0.4

  • )

( 6 . [ ) ( ) ( dictionary global w freq w context OCR accuracy w mark

i j i j i i j

⋅ + ⋅ ⋅ =

slide-15
SLIDE 15

TheAlgorithm(cont...)

slide-16
SLIDE 16

The The Experiments Experiments

slide-17
SLIDE 17

0.10% 1.18% g e 0.48% 1.58% 1 i 0.72% 0.15% l f 0.72% 0.30% z 2 0.10% 0.75% d da 0.77% 3.54%

  • 0.48%

1.58% 1 i

OCR2 OCR1 ErrorString SourceString

TheEnvironmentofthe TheEnvironmentofthe Experiments Experiments

slide-18
SLIDE 18

Examplesofsuccessful Examplesofsuccessful corrections corrections

details detaHs d,tails details precisent preci~ent pr,cisent precisent neighborhood nei&iborho~ ne~hborhood neighborhood school sciiool schooi school thankfully thankfiily thankfvlly thankfully we’re we’e we’te we’re survivors survivors surveyors survivors going goring going going

AcceptedWord OCR2 OCR1 OriginalWord

slide-19
SLIDE 19

Examplesoferroneousdecisions Examplesoferroneousdecisions

  • !
  • !
  • !!

! !

slide-20
SLIDE 20

4 zoo zooi 2ooi 2001 4 san sas savs says circus the lithe toe too dass lan 5,6oo

AcceptedWord

3 2 2 2 2 1 1 1

Errortype

circus circ.les circles We the We lisle lithe little toe true true too 100 100 dass dass class lan lan Ian 5,6oo 5,6oo 5,600

OCR2 OCR1 OriginalWord

slide-21
SLIDE 21
  • ftheResults

Analysis

slide-22
SLIDE 22
  • AnalysisoftheResults(cont…)
slide-23
SLIDE 23

AnalysisoftheResults(cont…)

3.6%

Both– fullcorrectionsystem

7 5.1%

Both– comparison+dictionary’sfrequencies

6 7.2%

Both– comparison+simpledictionarylookup

5 8.0%

OCR2– dictionary+candidatesgeneration

4 8.5%

OCR1– dictionary+candidatesgeneration

3 12.1%

OCR2– nopost-processing

2 14.0%

OCR1– nopost-processing

1 Error Rate

slide-24
SLIDE 24

AnalysisoftheResults(cont…) (D)falsenegative (C)falsepositive Incorrectword writtentooutput (B)truenegative (A)truepositive Correctword writtentooutput DonotaccepttheOCR word,andtrytosuggest candidates Acceptwordsfrom OCRasis

accepted

  • rds

correctwo

  • #

OCR

  • from

directly

  • accepted
  • rds

correctwo

  • #
  • Recall

= + = B A A

OCR

  • from

directly

  • accepted
  • words

total # OCR

  • from

directly

  • accepted
  • rds

correctwo

  • #

Precision = + = C A A

slide-25
SLIDE 25

AnalysisoftheResults(cont…)

Recallvs.Precision

84 86 88 90 92 94 96 98 100 90 92 94 96 98 100

Recall Precision

model1 model2 model3 model4 model5 model6 model7

slide-26
SLIDE 26

FurtherWork

  • MoreOCRdevices.
  • Context:NLPtechniques,wordclasse
  • Specificationsforcertainlanguage.