book scanning book scanning technologies and technologies

Book Scanning Book Scanning Technologies and Technologies and - PowerPoint PPT Presentation

Book Scanning Book Scanning Technologies and Technologies and Techniques Techniques Mike Mansfield Mike Mansfield Director of Content Director of Content Engineering Engineering Ancestry.com / Ancestry.com / Genealogy.com


  1. Book Scanning – Book Scanning – Technologies and Technologies and Techniques Techniques Mike Mansfield Mike Mansfield Director of Content Director of Content Engineering Engineering Ancestry.com / Ancestry.com / Genealogy.com Genealogy.com

  2. Outline Outline  Project Analysis Project Analysis  Scanning Parameters Scanning Parameters  Book Scanners Book Scanners

  3. Project Analysis Project Analysis Overview Overview  Scope Scope  Goals Goals  Project and Customer Requirements Project and Customer Requirements  Content Evaluation Content Evaluation

  4. Project Analysis Project Analysis  Scanning  Assessment Scanning Assessment  Quality Assurance Quality Assurance  Selection Selection  Post-Processing – OCR, Post-Processing – OCR,  Information Information Compression, Format Compression, Format Representation – Goals Representation – Goals Conversion Conversion and Metrics and Metrics  Return originals to the Return originals to the  Funding Funding collection collection  Planning and resource Planning and resource  Host the data Host the data assignment assignment  Archiving and Archiving and  Prepare the originals Prepare the originals preservation preservation for digitization for digitization

  5. Book Scanning Parameters Book Scanning Parameters Overview Overview  Resolution Resolution  Bit Depth Bit Depth  Dynamic Range Dynamic Range  Tonal Sensitivity Tonal Sensitivity  Geometrical Corrections Geometrical Corrections  De-Skew De-Skew  Curve Correction Curve Correction  Text Crushing Text Crushing  Masking and Cropping Masking and Cropping

  6. Resolution Resolution  Samples Per Inch (SPI), Dots Per Inch (DPI), Samples Per Inch (SPI), Dots Per Inch (DPI), Pixels Per Inch (PPI) Pixels Per Inch (PPI)  Archival Quality Archival Quality  Access Quality Access Quality  “ “Faithful” Representation of the page Faithful” Representation of the page

  7. Resolution and OCR Resolution and OCR  Most OCR engines are optimized for 300 Most OCR engines are optimized for 300 DPI images with typefaces in point sizes DPI images with typefaces in point sizes between 10 and 14. between 10 and 14.  In cases where the font size of characters In cases where the font size of characters on an image are very small (point size of 6 on an image are very small (point size of 6 or less), scanning images at 400 DPI can or less), scanning images at 400 DPI can improve character recognition improve character recognition

  8. Bit Depth Bit Depth  Number of colors or “tones” a scanner can Number of colors or “tones” a scanner can differentiate differentiate  Bitonal Bitonal  Grayscale Grayscale  Color Color

  9. Dynamic Range Dynamic Range  A scanner's dynamic range is a measure of A scanner's dynamic range is a measure of how well the device can record changes in how well the device can record changes in the brightness of the image it's scanning the brightness of the image it's scanning

  10. Tonal Sensitivity Tonal Sensitivity  The ability of a scanner to accurately The ability of a scanner to accurately represent similar, adjacent tonal values as represent similar, adjacent tonal values as distinct from each other distinct from each other

  11. Geometrical Corrections Geometrical Corrections  Deskew Deskew  Bookfold Corrections Bookfold Corrections  Curve Correction Curve Correction  Text Crushing Text Crushing

  12. Deskew Deskew  Skew detection and correction Skew detection and correction

  13. Bookfold Corrections Bookfold Corrections Curve Correction and Text Curve Correction and Text Crushing Crushing  Pages of bound books are three Pages of bound books are three dimensional surfaces dimensional surfaces

  14. Curve Correction and Text Curve Correction and Text Crushing Compensation Crushing Compensation  Straighten curves and preserve uniform Straighten curves and preserve uniform distances in the drape and gutters of distances in the drape and gutters of scanned book pages scanned book pages

  15. Finger Masking Finger Masking  Methods to remove the images of the Methods to remove the images of the operator’s fingers holding down the pages operator’s fingers holding down the pages during scanning during scanning

  16. Cropping and Page Splitting Cropping and Page Splitting  Detecting and cropping edges to remove Detecting and cropping edges to remove portions of the image containing the book portions of the image containing the book cover, end-papers, spine edges, and page cover, end-papers, spine edges, and page fan-outs. fan-outs.  Splitting double page images. Splitting double page images.

  17. Not What We Want Not What We Want

  18. What We Do Want What We Do Want

  19. Book Scanners Book Scanners Overview Overview  Document Scanners Document Scanners  Planetary Book Scanners Planetary Book Scanners  Flying Linear Arrays Flying Linear Arrays  Digital Photography Digital Photography  Robotic Page Turners Robotic Page Turners

  20. Document Scanners Document Scanners  Cut the spine off of the book and scan the Cut the spine off of the book and scan the loose pages in a document scanner loose pages in a document scanner  The book is rendered almost useless for The book is rendered almost useless for additional use additional use  Rebinding is expensive and slow Rebinding is expensive and slow  Makes most sense when a sacrificial Makes most sense when a sacrificial copy of the book exists. copy of the book exists.

  21. Document Scanners Document Scanners  Extremely Fast Extremely Fast  Feature Rich Feature Rich  Relatively Inexpensive Relatively Inexpensive  Large range of options and price points Large range of options and price points  Some limited applications in the Family Some limited applications in the Family History and Genealogy domain History and Genealogy domain

  22. Document Scanners Document Scanners  Major office Major office equipment equipment manufactures manufactures  Canon Canon  Fujitsu Fujitsu  Kodak Kodak  Panasonic Panasonic  Ricoh Ricoh

  23. Document Scanners Document Scanners  Resolution: 100-600 DPI Resolution: 100-600 DPI  Bit Depths: Bitonal, Grayscale, Color Bit Depths: Bitonal, Grayscale, Color  Simplex / Duplex Simplex / Duplex  2 x 3 inch to 12 x 30 inch documents 2 x 3 inch to 12 x 30 inch documents  Rate: Few hundred pages per day to tens Rate: Few hundred pages per day to tens of thousands of pages per day of thousands of pages per day  Deskewing, cropping, dithering, dynamic Deskewing, cropping, dithering, dynamic thresholding, binarization, etc… thresholding, binarization, etc…

  24. Planetary Book Scanners Planetary Book Scanners  Specialized devices designed to do Specialized devices designed to do primarily one thing – scan bound books primarily one thing – scan bound books  CCD Array, integrated lighting, specialized CCD Array, integrated lighting, specialized scan beds/book cradles, and book specific scan beds/book cradles, and book specific image processing options image processing options

  25. Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000  7,500 Pixel Reduction 7,500 Pixel Reduction type line CCD type line CCD  Halogen Lamp Lighting Halogen Lamp Lighting  Up to A2 Size Up to A2 Size  200/300/400/600 DPI 200/300/400/600 DPI  Bitonal or 8-bit Grayscale Bitonal or 8-bit Grayscale  4.5 Seconds per scan on 4.5 Seconds per scan on an A4 page at 400 DPI an A4 page at 400 DPI

  26. Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000  Image Processing Image Processing  Curvature Correction Curvature Correction  Text Crushing Text Crushing Correction Correction  Centering Centering  Finger Masking Finger Masking  Spread/Single/Book Spread/Single/Book Split Split  Linearization Linearization

  27. Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000  Articulating Book Articulating Book Cradle Cradle

  28. Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000  Scan buttons on the scan bed Scan buttons on the scan bed

  29. Minolta Minolta

  30. Bookeye Bookeye

  31. Zeutschel Zeutschel

  32. Planetary Book Scanners Planetary Book Scanners  Resolutions from 300 DPI to 600 DPI Resolutions from 300 DPI to 600 DPI  Bit-Depths: Bitonal, Grayscale, Full Color Bit-Depths: Bitonal, Grayscale, Full Color  Rich feature set well suited to large production Rich feature set well suited to large production projects projects  Book cradles, glass plates to reduce page curvature, Book cradles, glass plates to reduce page curvature, specialized image processing, human-factors, etc. specialized image processing, human-factors, etc.  Support for most book sizes from small books to Support for most book sizes from small books to large quarto volumes and smaller atlases large quarto volumes and smaller atlases  Proven technology, few moving parts, highly reliable Proven technology, few moving parts, highly reliable  1 page scan in 5-10 seconds 1 page scan in 5-10 seconds  1,500 – 3,000 pages per 8 hour shift 1,500 – 3,000 pages per 8 hour shift

Recommend


More recommend