Handwriting Recognition Handwriting Recognition for Genealogical - - PowerPoint PPT Presentation
Handwriting Recognition Handwriting Recognition for Genealogical - - PowerPoint PPT Presentation
FHT 2003 FHT 2003 Handwriting Recognition Handwriting Recognition for Genealogical Records for Genealogical Records Luke Hutchison Luke Hutchison lukeh@email.byu.edu lukeh@email.byu.edu Church Extraction Effort Church Extraction Effort
SLIDE 1
SLIDE 2
Church Extraction Effort Church Extraction Effort
- Nov 2002: Church released US 1880 and Canadian 1881
Nov 2002: Church released US 1880 and Canadian 1881 Census Census
- 55 million names
55 million names
- 11 million man-hours
11 million man-hours
- Granite Vault: contains 2.3 million rolls of microfilm
Granite Vault: contains 2.3 million rolls of microfilm ( = about 6 million 300-page volumes ) ( = about 6 million 300-page volumes )
- Approximate extraction time for one person
Approximate extraction time for one person (based on the above census): (based on the above census): 280 years, 24/7 280 years, 24/7
- We don't have that sort of time
We don't have that sort of time
- Need automated extraction: handwriting recognition
Need automated extraction: handwriting recognition
SLIDE 3
Example Microfilm Images Example Microfilm Images
SLIDE 4
Handwriting Recognition Handwriting Recognition
- Two different fields:
Two different fields:
- Online Handwriting Recognition
Online Handwriting Recognition
Writer's pen movements captured
Writer's pen movements captured
Velocity, acceleration, stroke order etc.
Velocity, acceleration, stroke order etc.
Style can be constrained (e.g. Graffitti gestures)
Style can be constrained (e.g. Graffitti gestures)
- Offline Handwriting Recognition
Offline Handwriting Recognition
Only pixels
Only pixels
Cannot constrain style (documents
Cannot constrain style (documents already written) already written)
- Offline is harder (less information)
Offline is harder (less information)
- Genealogical records are all offline
Genealogical records are all offline
Mary
SLIDE 5
Online Handwriting Recognition Online Handwriting Recognition
- Modern systems are moderately successful,
Modern systems are moderately successful,
- e.g. Microsoft Research's new Tablet PC:
e.g. Microsoft Research's new Tablet PC:
Polynomial coefficients e.g. [0.94, 0.05, 0.29,...]
SLIDE 6
Offline Offline Handwriting Recognition Handwriting Recognition
- A difficult problem
A difficult problem
- Almost as many approaches as there are researchers
Almost as many approaches as there are researchers
- e.g.
e.g.
- Pattern Recognition
Pattern Recognition
- Statistical analysis
Statistical analysis
- Mathematical modelling
Mathematical modelling
- Physics-based modelling
Physics-based modelling
- Subgraph matching / graph search
Subgraph matching / graph search
- Neural networks / machine learning
Neural networks / machine learning
- Fractal image compression
Fractal image compression
- ... (too many to list) ...
... (too many to list) ...
SLIDE 7
Previous Work: Offline Previous Work: Offline Online Conversion Online Conversion
- Finding contour
Finding contour
- Finding midline
Finding midline
- Stroke ordering – difficult problem
Stroke ordering – difficult problem
SLIDE 8
Offline Offline
Online Conversion ctd.
Online Conversion ctd.
- Especially difficult with genealogical records:
Especially difficult with genealogical records:
- Stroke ordering: difficult
Stroke ordering: difficult
- Broken lines / blobs?
Broken lines / blobs?
- Not practical
Not practical
SLIDE 9
Previous Work: Holistic Matching Previous Work: Holistic Matching
- Whole word is stretched to match known words
Whole word is stretched to match known words
- Sources of variation compound across word
Sources of variation compound across word
SLIDE 10
Previous Work: Sliding Window Previous Work: Sliding Window
- Narrow vertical window slides across word
Narrow vertical window slides across word
- A state machine recognizes sequences
A state machine recognizes sequences
- Results good, but sensitive to noise
Results good, but sensitive to noise
SLIDE 11
Previous Work: Parascript Previous Work: Parascript
- Features detected & put in sequence
Features detected & put in sequence
- Letters warped to best match sequence of features
Letters warped to best match sequence of features
- Complex; sensitive to noise
Complex; sensitive to noise
SLIDE 12
Handwriting Recognition Handwriting Recognition
- Some aspects of Handwriting Recognition:
Some aspects of Handwriting Recognition:
- Segmentation problem
Segmentation problem (can't read word until (can't read word until it is segmented; can't it is segmented; can't segment word until it is read) segment word until it is read)
- Different handwriting styles
Different handwriting styles
- Use of dictionary to correct
Use of dictionary to correct for errors in reading for errors in reading
nr? m?
Srnitb --> Smith
SLIDE 13
Thesis Approach: Preprocessing Thesis Approach: Preprocessing
Outlines of word are traced and smoothed: Outlines of word are traced and smoothed: Handwriting slope is corrected for automatically: Handwriting slope is corrected for automatically:
SLIDE 14
Segmentation Segmentation
- Goal: robustly cut letters into segments
Goal: robustly cut letters into segments
- Match multiple segments to detect letters
Match multiple segments to detect letters
- Easier than matching whole letter
Easier than matching whole letter
SLIDE 15
Dynamic Global Search Dynamic Global Search
- Assemble word spelling from possible letter readings
Assemble word spelling from possible letter readings
Best path: “Williarw Suwkino” (65% confidence)
SLIDE 16
Results (1) Results (1)
SLIDE 17
Results (2) Results (2)
SLIDE 18
Results (3) Results (3)
SLIDE 19
Results (4) Results (4)
In general: results even worse – system only worked well on words it was specifically trained on
SLIDE 20
The Human Brain's The Human Brain's Visual System Visual System
Retina
SLIDE 21
The Human Brain's The Human Brain's Visual System Visual System
Angular edge detectors Retina
SLIDE 22
The Human Brain's The Human Brain's Visual System Visual System
Angular edge detectors Retina Line / curve detectors ... ... ...
SLIDE 23
The Human Brain's The Human Brain's Visual System Visual System
Angular edge detectors Retina Line / curve detectors Feature detectors ... ... ...
SLIDE 24
The Human Brain's The Human Brain's Visual System Visual System
Angular edge detectors Retina Line / curve detectors Feature detectors ... ... ...
Lateral inhibition Feedback
SLIDE 25
The Human Brain's The Human Brain's Visual System Visual System
Angular edge detectors Retina Line / curve detectors Feature detectors Letter / word shape recognizers ... ... ...
Lateral inhibition Feedback
J
SLIDE 26
The Human Brain's The Human Brain's Visual System Visual System
Angular edge detectors Retina Line / curve detectors Feature detectors Letter / word shape recognizers ... ... ...
Lateral inhibition Feedback
J Joseph
SLIDE 27
Conclusions Conclusions
- Handwriting recognition is important for genealogy...
Handwriting recognition is important for genealogy... ...but it is hard ...but it is hard
- Current methods don't work very well...
Current methods don't work very well... ...and they don't operate much like the human brain ...and they don't operate much like the human brain
- Future work should focus on understanding the brain, and
Future work should focus on understanding the brain, and emulating it as much as possible, e.g. With: emulating it as much as possible, e.g. With:
- Hierarchical reasoning
Hierarchical reasoning
- Feedback
Feedback
- Lateral inhibition
Lateral inhibition
SLIDE 28
Questions? Questions?
Luke Hutchison Luke Hutchison lukeh@email.byu.edu lukeh@email.byu.edu
SLIDE 29