Handwriting Recognition Handwriting Recognition for Genealogical - - PowerPoint PPT Presentation

handwriting recognition handwriting recognition for
SMART_READER_LITE
LIVE PREVIEW

Handwriting Recognition Handwriting Recognition for Genealogical - - PowerPoint PPT Presentation

Handwriting Recognition Handwriting Recognition for Genealogical Records for Genealogical Records Luke Hutchison (lukeh@email.byu.edu) Luke Hutchison (lukeh@email.byu.edu) [Advisor: Dr. Tom Sederberg] [Advisor: Dr. Tom Sederberg] Handwriting


slide-1
SLIDE 1

Handwriting Recognition Handwriting Recognition for Genealogical Records for Genealogical Records

Luke Hutchison (lukeh@email.byu.edu) Luke Hutchison (lukeh@email.byu.edu) [Advisor: Dr. Tom Sederberg] [Advisor: Dr. Tom Sederberg]

slide-2
SLIDE 2

Handwriting Recognition Handwriting Recognition

  • Two different fields:

Two different fields:

  • Online Handwriting Recognition

Online Handwriting Recognition

  • The writer's pen movements are captured

The writer's pen movements are captured

  • Velocity, acceleration, stroke order available

Velocity, acceleration, stroke order available

  • Offline Handwriting Recognition

Offline Handwriting Recognition

  • Page was previously-written and scanned

Page was previously-written and scanned

  • Only pixel color information available

Only pixel color information available

  • Genealogical records are all offline

Genealogical records are all offline

  • Offline is harder (less information

Offline is harder (less information is available) is available)

Mar y

slide-3
SLIDE 3

Handwriting Recognition Handwriting Recognition

  • Can we just convert offline data into (simulated) online data?

Can we just convert offline data into (simulated) online data?

  • Yes, although difficult to do reliably:

Yes, although difficult to do reliably:

  • What order were the strokes written in?

What order were the strokes written in?

  • Doubled-up line segments? Ink blobs? Spurious joins between

Doubled-up line segments? Ink blobs? Spurious joins between letters? Missing joins? letters? Missing joins?

  • Especially difficult with genealogical records

Especially difficult with genealogical records

slide-4
SLIDE 4

Handwriting Recognition Handwriting Recognition

  • A successful approach must combine results from analysis of different domains, and at different levels of abstraction, e.g.

A successful approach must combine results from analysis of different domains, and at different levels of abstraction, e.g.

  • Discrete:

Discrete:

  • Stroke segmentation and ordering

Stroke segmentation and ordering

  • Digraph frequency tables, lexicons

Digraph frequency tables, lexicons

  • Continuous:

Continuous:

  • Letter shape analysis and matching

Letter shape analysis and matching

slide-5
SLIDE 5

Handwriting Recognition Handwriting Recognition

  • An example of some common steps in the analysis

An example of some common steps in the analysis process: process:

  • Contour extraction

Contour extraction

  • Midline determination

Midline determination

  • Stroke ordering

Stroke ordering

slide-6
SLIDE 6

Handwriting Recognition Handwriting Recognition

  • An example of some steps in the recognition process:

An example of some steps in the recognition process:

  • Handwriting style clustering

Handwriting style clustering

  • Letter recognition

Letter recognition

  • Approximate string matching

Approximate string matching

nr? m?

Smith Smythe

slide-7
SLIDE 7

HR for Genealogical Records HR for Genealogical Records

  • Image quality is not always good with microfilms

Image quality is not always good with microfilms

  • Fading of documents / microfilm

Fading of documents / microfilm

  • Ink-well pens

Ink-well pens

  • But documents were usually written meticulously

But documents were usually written meticulously

  • Older handwriting more regular; simpler to match

Older handwriting more regular; simpler to match

  • Different approach required

Different approach required

slide-8
SLIDE 8

The Approach The Approach

  • Outlines of word are traced and smoothed

Outlines of word are traced and smoothed

  • Some common sources of variation (e.g. differences in slope)

Some common sources of variation (e.g. differences in slope) are automatically corrected for. are automatically corrected for.

slide-9
SLIDE 9

The Approach The Approach

  • Robustly produce a characteristic “signature” for each letter

Robustly produce a characteristic “signature” for each letter

slide-10
SLIDE 10

The Approach The Approach

  • Find possible letter matches and determine possible readings (with accuracy of fit)

Find possible letter matches and determine possible readings (with accuracy of fit)

W M J U m w l l i i a

  • i

n r S O u ww a r t i k u n m

  • s

=> Williarw Suwkino (65%), ... , JiiUiom Oartums (1%)

slide-11
SLIDE 11

The Approach The Approach

  • Error Correction: Letter digraph frequencies

Error Correction: Letter digraph frequencies

  • E

E _ _ 2.617% 2.617%

  • E

E R R 1.438% 1.438%

  • N

N _ _ 1.280% 1.280%

  • A

A N N 1.276% 1.276%

  • _

_ S S 1.212% 1.212%

  • O

O N N 1.207% 1.207%

  • I

I N N 1.187% 1.187%

  • E

E N N 1.174% 1.174%

  • [...]

[...]

  • A

A W W 0.075% 0.075%

  • N

N K K 0.074% 0.074%

  • T

T L L 0.071% 0.071%

  • [...]

[...]

  • U

U W W 0.000% 0.000%

Suwkino --> Sawkino

slide-12
SLIDE 12

The Approach The Approach

  • Error Correction: Name Lexicon

Error Correction: Name Lexicon

  • Last names:

Last names:

  • Smith

Smith 1.105% 1.105%

  • Jones

Jones 0.817% 0.817%

  • Williams

Williams 0.653% 0.653%

  • Brown

Brown 0.371% 0.371%

  • [...]

[...]

  • Sawkins

Sawkins 0.012% 0.012%

  • First Names:

First Names:

  • James

James 1.615% 1.615%

  • John

John 1.203% 1.203%

  • Robert

Robert 1.022% 1.022%

  • Michael

Michael 0.971% 0.971%

  • William

William 0.954% 0.954%

=> William Sawkins (95%)

slide-13
SLIDE 13

Conclusions Conclusions

  • [Work in progress]

[Work in progress]

  • (Semi-) Automated extraction system could dramatically reduce extraction time

(Semi-) Automated extraction system could dramatically reduce extraction time

  • [Demo: Concept search engine...]

[Demo: Concept search engine...]