TINY TEXT AHEAD! Move up! Quality OCR A TANGO OF AVAILABLE - - PowerPoint PPT Presentation

tiny text ahead move up quality ocr
SMART_READER_LITE
LIVE PREVIEW

TINY TEXT AHEAD! Move up! Quality OCR A TANGO OF AVAILABLE - - PowerPoint PPT Presentation

TINY TEXT AHEAD! Move up! Quality OCR A TANGO OF AVAILABLE RESOURCES Michelle Paolillo, Digital Lifecycle Lead Mira Basara, Ingest Collection Specialist Cornell University Quality OCR A TANGO OF AVAILABLE RESOURCES Michelle Paolillo,


slide-1
SLIDE 1

TINY TEXT AHEAD! Move up!

slide-2
SLIDE 2

Quality OCR

A TANGO OF AVAILABLE RESOURCES

Michelle Paolillo, Digital Lifecycle Lead Mira Basara, Ingest Collection Specialist Cornell University

slide-3
SLIDE 3

Quality OCR

A TANGO OF AVAILABLE RESOURCES

Michelle Paolillo, Digital Lifecycle Lead Mira Basara, Ingest Collection Specialist Cornell University

slide-4
SLIDE 4

Quality OCR

A TANGO OF AVAILABLE RESOURCES

Michelle Paolillo, Digital Lifecycle Lead Mira Basara, Ingest Collection Specialist Cornell University

slide-5
SLIDE 5

Who among us…

…are OCR nerds?

slide-6
SLIDE 6

History ‐ OCR at Cornell

D‐Lib Magazine, October 1996, http://mirror.dlib.org/dlib/october96/cornell/10chapman.html

slide-7
SLIDE 7

evolutions…

TextBridge

slide-8
SLIDE 8
  • ur current practice

CHALLENGES AND STRATEGIES

slide-9
SLIDE 9
slide-10
SLIDE 10

TextBridge 2007 ABBYY FR 2019

slide-11
SLIDE 11

TextBridge 2005 ABBYY FR 2019

slide-12
SLIDE 12

Latin, Greek Latin

slide-13
SLIDE 13

Original image Inverted, cropped, deskewed image

slide-14
SLIDE 14

OCR from Original OCR from inverted, cropped

slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17

Character fill – pattern training helps

slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20

OCR of Fraktur (as found in older German texts) is very difficult.

  • For many characters, there are rules that govern the form of a

character depending in its position.

  • Other characters that are interpreted differently may resemble

each other closely.

Fraktur

slide-21
SLIDE 21
slide-22
SLIDE 22

Fraktur OCR ‐ Google Cloud Vision

slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
  • pportunities

DECISION POINTS

slide-26
SLIDE 26

Points of opportunity Re‐OCR

slide-27
SLIDE 27

Forced decision point

Upgraded

slide-28
SLIDE 28

Testing alternatives

slide-29
SLIDE 29
slide-30
SLIDE 30
slide-31
SLIDE 31

Quality OCR

A TANGO OF AVAILABLE RESOURCES

Michelle Paolillo, Digital Lifecycle Lead Mira Basara, Ingest Collection Specialist Cornell University

slide-32
SLIDE 32

Quality OCR

A TANGO OF AVAILABLE RESOURCES

Michelle Paolillo, Digital Lifecycle Lead Mira Basara, Ingest Collection Specialist Cornell University

slide-33
SLIDE 33

Discussion navigate to www.menti.com And use code 97 04 40