handwriting recognition handwriting recognition for
play

Handwriting Recognition Handwriting Recognition for Genealogical - PowerPoint PPT Presentation

FHT 2003 FHT 2003 Handwriting Recognition Handwriting Recognition for Genealogical Records for Genealogical Records Luke Hutchison Luke Hutchison lukeh@email.byu.edu lukeh@email.byu.edu Church Extraction Effort Church Extraction Effort


  1. FHT 2003 FHT 2003 Handwriting Recognition Handwriting Recognition for Genealogical Records for Genealogical Records Luke Hutchison Luke Hutchison lukeh@email.byu.edu lukeh@email.byu.edu

  2. Church Extraction Effort Church Extraction Effort • Nov 2002: Church released US 1880 and Canadian 1881 Nov 2002: Church released US 1880 and Canadian 1881 Census Census • 55 million names 55 million names • 11 million man-hours 11 million man-hours • Granite Vault: contains 2.3 million rolls of microfilm Granite Vault: contains 2.3 million rolls of microfilm ( = about 6 million 300-page volumes ) ( = about 6 million 300-page volumes ) • Approximate extraction time for one person Approximate extraction time for one person (based on the above census): 280 years, 24/7 280 years, 24/7 (based on the above census): • We don't have that sort of time We don't have that sort of time • Need automated extraction: handwriting recognition Need automated extraction: handwriting recognition

  3. Example Microfilm Images Example Microfilm Images

  4. Handwriting Recognition Handwriting Recognition • Two different fields: Two different fields: • Online Handwriting Recognition Online Handwriting Recognition  Writer's pen movements captured Writer's pen movements captured  Velocity, acceleration, stroke order etc. Velocity, acceleration, stroke order etc.  Style can be constrained (e.g. Graffitti gestures) Style can be constrained (e.g. Graffitti gestures) • Offline Handwriting Recognition Offline Handwriting Recognition  Only pixels Only pixels  Cannot constrain style (documents Cannot constrain style (documents already written) already written) • Offline is harder (less information) Offline is harder (less information) • Genealogical records are all offline Genealogical records are all offline Mary

  5. Online Handwriting Recognition Online Handwriting Recognition • Modern systems are moderately successful, Modern systems are moderately successful, • e.g. Microsoft Research's new Tablet PC: e.g. Microsoft Research's new Tablet PC: Polynomial coefficients e.g. [0.94, 0.05, 0.29,...]

  6. Offline Handwriting Recognition Handwriting Recognition Offline • A difficult problem A difficult problem • Almost as many approaches as there are researchers Almost as many approaches as there are researchers • e.g. e.g. • Pattern Recognition Pattern Recognition • Statistical analysis Statistical analysis • Mathematical modelling Mathematical modelling • Physics-based modelling Physics-based modelling • Subgraph matching / graph search Subgraph matching / graph search • Neural networks / machine learning Neural networks / machine learning • Fractal image compression Fractal image compression • ... (too many to list) ... ... (too many to list) ...

  7. Previous Work: Offline   Online Conversion Online Conversion Previous Work: Offline • Finding contour Finding contour • Finding midline Finding midline • Stroke ordering – difficult problem Stroke ordering – difficult problem

  8. Offline  Online Conversion ctd. Offline  Online Conversion ctd. • Especially difficult with genealogical records: Especially difficult with genealogical records: • Stroke ordering: difficult Stroke ordering: difficult • Broken lines / blobs? Broken lines / blobs? • Not practical Not practical

  9. Previous Work: Holistic Matching Previous Work: Holistic Matching • Whole word is stretched to match known words Whole word is stretched to match known words • Sources of variation compound across word Sources of variation compound across word

  10. Previous Work: Sliding Window Previous Work: Sliding Window • Narrow vertical window slides across word Narrow vertical window slides across word • A state machine recognizes sequences A state machine recognizes sequences • Results good, but sensitive to noise Results good, but sensitive to noise

  11. Previous Work: Parascript Previous Work: Parascript • Features detected & put in sequence Features detected & put in sequence • Letters warped to best match sequence of features Letters warped to best match sequence of features • Complex; sensitive to noise Complex; sensitive to noise

  12. Handwriting Recognition Handwriting Recognition • Some aspects of Handwriting Recognition: Some aspects of Handwriting Recognition: nr? • Segmentation problem Segmentation problem (can't read word until (can't read word until it is segmented; can't m? it is segmented; can't segment word until it is read) segment word until it is read) • Different handwriting styles Different handwriting styles • Use of dictionary to correct Use of dictionary to correct Srnitb --> Smith for errors in reading for errors in reading

  13. Thesis Approach: Preprocessing Thesis Approach: Preprocessing Outlines of word are traced and smoothed: Outlines of word are traced and smoothed: Handwriting slope is corrected for automatically: Handwriting slope is corrected for automatically:

  14. Segmentation Segmentation • Goal: robustly cut letters into segments Goal: robustly cut letters into segments • Match multiple segments to detect letters Match multiple segments to detect letters • Easier than matching whole letter Easier than matching whole letter

  15. Dynamic Global Search Dynamic Global Search • Assemble word spelling from possible letter readings Assemble word spelling from possible letter readings Best path: “Williarw Suwkino” (65% confidence)

  16. Results (1) Results (1)

  17. Results (2) Results (2)

  18. Results (3) Results (3)

  19. Results (4) Results (4) In general: results even worse – system only worked well on words it was specifically trained on

  20. The Human Brain's The Human Brain's Visual System Visual System Retina

  21. The Human Brain's The Human Brain's Visual System Visual System Angular edge detectors Retina

  22. The Human Brain's The Human Brain's Visual System Visual System Line / curve detectors ... ... ... Angular edge detectors Retina

  23. The Human Brain's The Human Brain's Visual System Visual System Feature detectors Line / curve detectors ... ... ... Angular edge detectors Retina

  24. The Human Brain's The Human Brain's Visual System Visual System Lateral inhibition Feature detectors Feedback Line / curve detectors ... ... ... Angular edge detectors Retina

  25. The Human Brain's The Human Brain's Visual System Visual System Letter / word shape J recognizers Lateral inhibition Feature detectors Feedback Line / curve detectors ... ... ... Angular edge detectors Retina

  26. The Human Brain's The Human Brain's Visual System Visual System Joseph Letter / word shape J recognizers Lateral inhibition Feature detectors Feedback Line / curve detectors ... ... ... Angular edge detectors Retina

  27. Conclusions Conclusions • Handwriting recognition is important for genealogy... Handwriting recognition is important for genealogy... ...but it is hard ...but it is hard • Current methods don't work very well... Current methods don't work very well... ...and they don't operate much like the human brain ...and they don't operate much like the human brain • Future work should focus on understanding the brain, and Future work should focus on understanding the brain, and emulating it as much as possible, e.g. With: emulating it as much as possible, e.g. With: • Hierarchical reasoning Hierarchical reasoning • Feedback Feedback • Lateral inhibition Lateral inhibition

  28. Questions? Questions? Luke Hutchison Luke Hutchison lukeh@email.byu.edu lukeh@email.byu.edu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend