Book Scanning Book Scanning Technologies and Technologies and - - PowerPoint PPT Presentation

book scanning book scanning technologies and technologies
SMART_READER_LITE
LIVE PREVIEW

Book Scanning Book Scanning Technologies and Technologies and - - PowerPoint PPT Presentation

Book Scanning Book Scanning Technologies and Technologies and Techniques Techniques Mike Mansfield Mike Mansfield Director of Content Director of Content Engineering Engineering Ancestry.com / Ancestry.com / Genealogy.com


slide-1
SLIDE 1

Book Scanning – Book Scanning – Technologies and Technologies and Techniques Techniques

Mike Mansfield Mike Mansfield Director of Content Director of Content Engineering Engineering Ancestry.com / Ancestry.com / Genealogy.com Genealogy.com

slide-2
SLIDE 2

Outline Outline

 Project Analysis

Project Analysis

 Scanning Parameters

Scanning Parameters

 Book Scanners

Book Scanners

slide-3
SLIDE 3

Project Analysis Project Analysis Overview Overview

 Scope

Scope

 Goals

Goals

 Project and Customer Requirements

Project and Customer Requirements

 Content Evaluation

Content Evaluation

slide-4
SLIDE 4

Project Analysis Project Analysis

 Assessment

Assessment

 Selection

Selection

 Information

Information Representation – Goals Representation – Goals and Metrics and Metrics

 Funding

Funding

 Planning and resource

Planning and resource assignment assignment

 Prepare the originals

Prepare the originals for digitization for digitization

 Scanning

Scanning

 Quality Assurance

Quality Assurance

 Post-Processing – OCR,

Post-Processing – OCR, Compression, Format Compression, Format Conversion Conversion

 Return originals to the

Return originals to the collection collection

 Host the data

Host the data

 Archiving and

Archiving and preservation preservation

slide-5
SLIDE 5

Book Scanning Parameters Book Scanning Parameters Overview Overview

 Resolution

Resolution

 Bit Depth

Bit Depth

 Dynamic Range

Dynamic Range

 Tonal Sensitivity

Tonal Sensitivity

 Geometrical Corrections

Geometrical Corrections

 De-Skew

De-Skew

 Curve Correction

Curve Correction

 Text Crushing

Text Crushing

 Masking and Cropping

Masking and Cropping

slide-6
SLIDE 6

Resolution Resolution

 Samples Per Inch (SPI), Dots Per Inch (DPI),

Samples Per Inch (SPI), Dots Per Inch (DPI), Pixels Per Inch (PPI) Pixels Per Inch (PPI)

 Archival Quality

Archival Quality

 Access Quality

Access Quality

 “

“Faithful” Representation of the page Faithful” Representation of the page

slide-7
SLIDE 7

Resolution and OCR Resolution and OCR

 Most OCR engines are optimized for 300

Most OCR engines are optimized for 300 DPI images with typefaces in point sizes DPI images with typefaces in point sizes between 10 and 14. between 10 and 14.

 In cases where the font size of characters

In cases where the font size of characters

  • n an image are very small (point size of 6
  • n an image are very small (point size of 6
  • r less), scanning images at 400 DPI can
  • r less), scanning images at 400 DPI can

improve character recognition improve character recognition

slide-8
SLIDE 8

Bit Depth Bit Depth

 Number of colors or “tones” a scanner can

Number of colors or “tones” a scanner can differentiate differentiate

 Bitonal

Bitonal

 Grayscale

Grayscale

 Color

Color

slide-9
SLIDE 9

Dynamic Range Dynamic Range

 A scanner's dynamic range is a measure of

A scanner's dynamic range is a measure of how well the device can record changes in how well the device can record changes in the brightness of the image it's scanning the brightness of the image it's scanning

slide-10
SLIDE 10

Tonal Sensitivity Tonal Sensitivity

 The ability of a scanner to accurately

The ability of a scanner to accurately represent similar, adjacent tonal values as represent similar, adjacent tonal values as distinct from each other distinct from each other

slide-11
SLIDE 11

Geometrical Corrections Geometrical Corrections

 Deskew

Deskew

 Bookfold Corrections

Bookfold Corrections

 Curve Correction

Curve Correction

 Text Crushing

Text Crushing

slide-12
SLIDE 12

Deskew Deskew

 Skew detection and correction

Skew detection and correction

slide-13
SLIDE 13

Bookfold Corrections Bookfold Corrections Curve Correction and Text Curve Correction and Text Crushing Crushing

 Pages of bound books are three

Pages of bound books are three dimensional surfaces dimensional surfaces

slide-14
SLIDE 14
slide-15
SLIDE 15

Curve Correction and Text Curve Correction and Text Crushing Compensation Crushing Compensation

 Straighten curves and preserve uniform

Straighten curves and preserve uniform distances in the drape and gutters of distances in the drape and gutters of scanned book pages scanned book pages

slide-16
SLIDE 16

Finger Masking Finger Masking

 Methods to remove the images of the

Methods to remove the images of the

  • perator’s fingers holding down the pages
  • perator’s fingers holding down the pages

during scanning during scanning

slide-17
SLIDE 17

Cropping and Page Splitting Cropping and Page Splitting

 Detecting and cropping edges to remove

Detecting and cropping edges to remove portions of the image containing the book portions of the image containing the book cover, end-papers, spine edges, and page cover, end-papers, spine edges, and page fan-outs. fan-outs.

 Splitting double page images.

Splitting double page images.

slide-18
SLIDE 18

Not What We Want Not What We Want

slide-19
SLIDE 19
slide-20
SLIDE 20

What We Do Want What We Do Want

slide-21
SLIDE 21

Book Scanners Book Scanners Overview Overview

 Document Scanners

Document Scanners

 Planetary Book Scanners

Planetary Book Scanners

 Flying Linear Arrays

Flying Linear Arrays

 Digital Photography

Digital Photography

 Robotic Page Turners

Robotic Page Turners

slide-22
SLIDE 22

Document Scanners Document Scanners

 Cut the spine off of the book and scan the

Cut the spine off of the book and scan the loose pages in a document scanner loose pages in a document scanner

 The book is rendered almost useless for

The book is rendered almost useless for additional use additional use

 Rebinding is expensive and slow

Rebinding is expensive and slow

 Makes most sense when a sacrificial

Makes most sense when a sacrificial copy of the book exists. copy of the book exists.

slide-23
SLIDE 23

Document Scanners Document Scanners

 Extremely Fast

Extremely Fast

 Feature Rich

Feature Rich

 Relatively Inexpensive

Relatively Inexpensive

 Large range of options and price points

Large range of options and price points

 Some limited applications in the Family

Some limited applications in the Family History and Genealogy domain History and Genealogy domain

slide-24
SLIDE 24

Document Scanners Document Scanners

 Major office

Major office equipment equipment manufactures manufactures

 Canon

Canon

 Fujitsu

Fujitsu

 Kodak

Kodak

 Panasonic

Panasonic

 Ricoh

Ricoh

slide-25
SLIDE 25

Document Scanners Document Scanners

 Resolution: 100-600 DPI

Resolution: 100-600 DPI

 Bit Depths: Bitonal, Grayscale, Color

Bit Depths: Bitonal, Grayscale, Color

 Simplex / Duplex

Simplex / Duplex

 2 x 3 inch to 12 x 30 inch documents

2 x 3 inch to 12 x 30 inch documents

 Rate: Few hundred pages per day to tens

Rate: Few hundred pages per day to tens

  • f thousands of pages per day
  • f thousands of pages per day

 Deskewing, cropping, dithering, dynamic

Deskewing, cropping, dithering, dynamic thresholding, binarization, etc… thresholding, binarization, etc…

slide-26
SLIDE 26

Planetary Book Scanners Planetary Book Scanners

 Specialized devices designed to do

Specialized devices designed to do primarily one thing – scan bound books primarily one thing – scan bound books

 CCD Array, integrated lighting, specialized

CCD Array, integrated lighting, specialized scan beds/book cradles, and book specific scan beds/book cradles, and book specific image processing options image processing options

slide-27
SLIDE 27

Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000

 7,500 Pixel Reduction

7,500 Pixel Reduction type line CCD type line CCD

 Halogen Lamp Lighting

Halogen Lamp Lighting

 Up to A2 Size

Up to A2 Size

 200/300/400/600 DPI

200/300/400/600 DPI

 Bitonal or 8-bit Grayscale

Bitonal or 8-bit Grayscale

 4.5 Seconds per scan on

4.5 Seconds per scan on an A4 page at 400 DPI an A4 page at 400 DPI

slide-28
SLIDE 28

Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000

 Image Processing

Image Processing

 Curvature Correction

Curvature Correction

 Text Crushing

Text Crushing Correction Correction

 Centering

Centering

 Finger Masking

Finger Masking

 Spread/Single/Book

Spread/Single/Book Split Split

 Linearization

Linearization

slide-29
SLIDE 29

Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000

 Articulating Book

Articulating Book Cradle Cradle

slide-30
SLIDE 30

Dissection of a Minolta PS Dissection of a Minolta PS 7000 7000

 Scan buttons on the scan bed

Scan buttons on the scan bed

slide-31
SLIDE 31

Minolta Minolta

slide-32
SLIDE 32

Bookeye Bookeye

slide-33
SLIDE 33

Zeutschel Zeutschel

slide-34
SLIDE 34

Planetary Book Scanners Planetary Book Scanners

 Resolutions from 300 DPI to 600 DPI

Resolutions from 300 DPI to 600 DPI

 Bit-Depths: Bitonal, Grayscale, Full Color

Bit-Depths: Bitonal, Grayscale, Full Color

 Rich feature set well suited to large production

Rich feature set well suited to large production projects projects

 Book cradles, glass plates to reduce page curvature,

Book cradles, glass plates to reduce page curvature, specialized image processing, human-factors, etc. specialized image processing, human-factors, etc.

 Support for most book sizes from small books to

Support for most book sizes from small books to large quarto volumes and smaller atlases large quarto volumes and smaller atlases

 Proven technology, few moving parts, highly reliable

Proven technology, few moving parts, highly reliable

 1 page scan in 5-10 seconds

1 page scan in 5-10 seconds

 1,500 – 3,000 pages per 8 hour shift

1,500 – 3,000 pages per 8 hour shift

slide-35
SLIDE 35

Flying Linear Arrays Flying Linear Arrays

 Integrated flying linescan CCDs and

Integrated flying linescan CCDs and lighting systems with specialized scan lighting systems with specialized scan tables and book cradles tables and book cradles

slide-36
SLIDE 36

i2S – DigiBook, Zeutschel i2S – DigiBook, Zeutschel

slide-37
SLIDE 37

Flying Linear Arrays Flying Linear Arrays

 Resolutions up to 800 DPI

Resolutions up to 800 DPI

 Bit-Depths: Bitonal, Grayscale, Full Color

Bit-Depths: Bitonal, Grayscale, Full Color

 Very high quality scans

Very high quality scans

 Feature rich systems with book specific

Feature rich systems with book specific support support

 Book cradles, glass plates, human-operation factors

Book cradles, glass plates, human-operation factors

 Support for very large volumes, atlases, and

Support for very large volumes, atlases, and maps to a meter square maps to a meter square

 1 page scan in 2-6 seconds

1 page scan in 2-6 seconds

 2,500 – 4,500 pages per 8 hour shift

2,500 – 4,500 pages per 8 hour shift

slide-38
SLIDE 38

Digital Photography Digital Photography

 Large Photographic Formats

Large Photographic Formats

 Scanbacks

Scanbacks

 Professional Photographic Optics and

Professional Photographic Optics and Cameras Cameras

 Studio Lighting Systems

Studio Lighting Systems

 Color Management Software

Color Management Software

 Custom camera positioning and book

Custom camera positioning and book holders holders

slide-39
SLIDE 39

Large Format Photography Large Format Photography

 Better Tonality

Better Tonality

 Higher color depth

Higher color depth

 Sharper

Sharper

 Grain-free

Grain-free

 More control on the final geometry and

More control on the final geometry and perspective of the photographed pages perspective of the photographed pages

slide-40
SLIDE 40

Scanbacks Scanbacks

 Large Format

Large Format Cameras Cameras

 Trilinear array,

Trilinear array, 1-pass scan 1-pass scan

 6000 x 7250 pixels

6000 x 7250 pixels = 43 megapixels = 43 megapixels

 30+ seconds for a

30+ seconds for a single scan single scan

slide-41
SLIDE 41

Digitizing the Gutenberg Digitizing the Gutenberg Bible Bible

slide-42
SLIDE 42

Digitizing the Gutenberg Digitizing the Gutenberg Bible Bible

slide-43
SLIDE 43

Anagramm Picture Gate 8000 Anagramm Picture Gate 8000 Scanback Scanback

 Trilinear CCD Array, 1-pass scan

Trilinear CCD Array, 1-pass scan

 Large Format Camera 9 x 12 cm

Large Format Camera 9 x 12 cm

 8000 x 9700 Pixel Optical Resolution =

8000 x 9700 Pixel Optical Resolution = 77.6 megapixels 77.6 megapixels

 48 Bit Color Depth

48 Bit Color Depth

 444 MB File in 48 Bit Color

444 MB File in 48 Bit Color

 40 Seconds for a full scan

40 Seconds for a full scan

 Fiber-optic connection

Fiber-optic connection

slide-44
SLIDE 44

Digital Photography Digital Photography

 Super high quality images

Super high quality images

 Custom lighting and positioning

Custom lighting and positioning

 Slow

Slow

 Scanning page images is slow

Scanning page images is slow

 Positing each page is slow

Positing each page is slow

 Skilled and experienced photographers

Skilled and experienced photographers

 Few applications in the Family History and

Few applications in the Family History and Genealogy domain Genealogy domain

slide-45
SLIDE 45

Robotic Page Turners Robotic Page Turners

 Kirtas APT BookScan 1200

Kirtas APT BookScan 1200

 i2S DigiBook – Digitizing Line

i2S DigiBook – Digitizing Line

slide-46
SLIDE 46

Conclusion Conclusion

 Analyze your project’s requirements and

Analyze your project’s requirements and scope scope

 Understand the content and determine the

Understand the content and determine the scanning metrics scanning metrics

 Match the scanning technology to the

Match the scanning technology to the content and project goals content and project goals

slide-47
SLIDE 47

Questions Questions

MMansfield@Myfamilyinc.com MMansfield@Myfamilyinc.com