AVotingSystemfor AVotingSystemfor AutomaticCorrectionofOCR - - PowerPoint PPT Presentation
AVotingSystemfor AVotingSystemfor AutomaticCorrectionofOCR - - PowerPoint PPT Presentation
AVotingSystemfor AVotingSystemfor AutomaticCorrectionofOCR AutomaticCorrectionofOCR Output Output
Introduction Introduction
OCR=OpticalCharacterRecognition
- !
KnownTechniquesfor KnownTechniquesfor SpellingCorrection SpellingCorrection
EditDistance: Theminimumnumberofeditingoperations (i.e.,insertion,deletionandsubstitutionof letters)requiredtotransformonestringto another.
KnownTechniquesforSpellingCorrection KnownTechniquesforSpellingCorrection (cont...) (cont...) Hashing:
SkeletonKey
- .
. cmnctouia communication . .
- 1
2 1 3 4 5 6 1 7 8 1
So,whatistheproblem? So,whatistheproblem?
- Mostofthetechniquesarerelevantonly
fortypingerrors.
- Designedforisolated
words.
- Weareinterestedinafully
automaticsystem.
TheAlgorithm TheAlgorithm
TheAlgorithm(cont...)
TheAlgorithm(cont...)
- 1
j
w
2 j
w
TheAlgorithm(cont...)
- Ifwordsareidentical:
Accepttheword.
- 1
j
w
2 j
w
- 1
j
w
2 j
w
TheAlgorithm(cont...)
Ifthewordsmismatch:
- If onlyoneofthemisvalidThen:
Acceptthevalidone.
1 j
w
2 j
w
- )
_ , ( 4 . ) _ , ( 6 . ) (
- dictionary
global w freq dictionary local w freq w dictionary
i j i j i j
⋅ + ⋅ =
TheAlgorithm(cont...)
- 1
j
w
2 j
w
- )
(
- )
(
2 1 j j
w w Candidates
- =
TheAlgorithm(cont...)
- =
=
- therwise.
- .
1 ) , (
- if
- 1
_
i j i j
k
w w nce edit_dista close is
)) , ( _
- )
(
- )
, , ( _ ( 0.4
- ))
, , ( _ ( 6 . ) (
1 1 i j i j i j i j i j i j i i j i j i j
k k k k k
w w close is w dictionary w w w gram word OCR w w freq error w mark + + ⋅ + ⋅ =
+ −
If bothwordsarevalid: TheAlgorithm(cont...)
). , , ( _
- )
_ , ( ) (
1 1 i j i j i j i j i j
w w w gram word dictionary local w freq w context
+ −
+ =
)] _ , ( 0.4
- )
( 6 . [ ) ( ) ( dictionary global w freq w context OCR accuracy w mark
i j i j i i j
⋅ + ⋅ ⋅ =
TheAlgorithm(cont...)
The The Experiments Experiments
0.10% 1.18% g e 0.48% 1.58% 1 i 0.72% 0.15% l f 0.72% 0.30% z 2 0.10% 0.75% d da 0.77% 3.54%
- 0.48%
1.58% 1 i
OCR2 OCR1 ErrorString SourceString
TheEnvironmentofthe TheEnvironmentofthe Experiments Experiments
Examplesofsuccessful Examplesofsuccessful corrections corrections
details detaHs d,tails details precisent preci~ent pr,cisent precisent neighborhood nei&iborho~ ne~hborhood neighborhood school sciiool schooi school thankfully thankfiily thankfvlly thankfully we’re we’e we’te we’re survivors survivors surveyors survivors going goring going going
AcceptedWord OCR2 OCR1 OriginalWord
Examplesoferroneousdecisions Examplesoferroneousdecisions
- !
- !
- !!
! !
4 zoo zooi 2ooi 2001 4 san sas savs says circus the lithe toe too dass lan 5,6oo
AcceptedWord
3 2 2 2 2 1 1 1
Errortype
circus circ.les circles We the We lisle lithe little toe true true too 100 100 dass dass class lan lan Ian 5,6oo 5,6oo 5,600
OCR2 OCR1 OriginalWord
- ftheResults
Analysis
- AnalysisoftheResults(cont…)
AnalysisoftheResults(cont…)
3.6%
Both– fullcorrectionsystem
7 5.1%
Both– comparison+dictionary’sfrequencies
6 7.2%
Both– comparison+simpledictionarylookup
5 8.0%
OCR2– dictionary+candidatesgeneration
4 8.5%
OCR1– dictionary+candidatesgeneration
3 12.1%
OCR2– nopost-processing
2 14.0%
OCR1– nopost-processing
1 Error Rate
AnalysisoftheResults(cont…) (D)falsenegative (C)falsepositive Incorrectword writtentooutput (B)truenegative (A)truepositive Correctword writtentooutput DonotaccepttheOCR word,andtrytosuggest candidates Acceptwordsfrom OCRasis
accepted
- rds
correctwo
- #
OCR
- from
directly
- accepted
- rds
correctwo
- #
- Recall
= + = B A A
OCR
- from
directly
- accepted
- words
total # OCR
- from
directly
- accepted
- rds
correctwo
- #
Precision = + = C A A
AnalysisoftheResults(cont…)
Recallvs.Precision
84 86 88 90 92 94 96 98 100 90 92 94 96 98 100
Recall Precision
model1 model2 model3 model4 model5 model6 model7
FurtherWork
- MoreOCRdevices.
- Context:NLPtechniques,wordclasse
- Specificationsforcertainlanguage.