G l y p h M i n e r : A S y s t e m f o r E f - - PowerPoint PPT Presentation

g l y p h m i n e r a s y s t e m f o r e f f i c i e n t
SMART_READER_LITE
LIVE PREVIEW

G l y p h M i n e r : A S y s t e m f o r E f - - PowerPoint PPT Presentation

G l y p h M i n e r : A S y s t e m f o r E f f i c i e n t l y E x t r a c t i n g G l y p h s f r o m E a r l y P r i n t s i n t h e C o n t e x t o f O C R Benedikt


slide-1
SLIDE 1

G l y p h M i n e r : A S y s t e m f

  • r

E f f i c i e n t l y E x t r a c t i n g G l y p h s f r

  • m

E a r l y P r i n t s i n t h e C

  • n

t e x t

  • f

O C R

Benedikt Budig Thomas C. van Dijk Felix Kirchner

slide-2
SLIDE 2

I n c u n a b l e s : Wh a t a r e t h e y ?

  • E

a r l y p r i n t s ( b e f

  • r

e 1 5 )

  • M
  • v

a b l e t y p e

  • D

e s i g n r e s e m b l e s m e d i e v a l h a n d w r i t i n g s

  • C

i r c a 3 , e d i t i

  • n

s

slide-3
SLIDE 3

T h e S h i p

  • f

F

  • l

s

  • I

n c u n a b l e s

  • f

S e b a s t i a n B r a n t ' s N a r r e n s c h i f f

  • O

n e

  • f

t h e m

  • s

t p

  • p

u l a r b

  • k

s i n t h e e a r l y m

  • d

e r n p e r i

  • d

B a s e l 1 4 9 4 G e r m a n N

ür

n b e r g 1 4 9 4 G e r m a n B a s e l 1 4 9 8 L a t i n P a r i s 1 4 9 7 F r e n c h L

  • n

d

  • n

1 5 9 E n g l i s h

slide-4
SLIDE 4

C h a l l e n g e s f

  • r

O C R

  • M

a n y d i f f e r e n t g l y p h s : a b b r e v i a t i

  • n

s , l i g a t u r e s , d i a c r i t i c s

  • V

a r i a n c e i n p r i n t i n g , p

  • r

c

  • n

s e r v a t i

  • n

s t a t e

  • O

f f

  • t

h e

  • s

h e l f O C R s

  • f

t w a r e f a i l s → T r a i n g e n e r a l p u r p

  • s

e O C R s

  • f

t w a r e ( e . g . T e s s e r a c t

  • r

O C R

  • p

u s ) r e q u i r e s t r a i n i n g d a t a

slide-5
SLIDE 5

I n t r

  • d

u c i n g G l y p h M i n e r

  • O

b j e c t i v e :

  • b

t a i n t r a i n i n g d a t a f

  • r

O C R s

  • f

t w a r e

  • E

x i s t i n g p r a c t i c e : e . g . A l e t h e i a , F r a n k e n + , G a m e r a

  • M

a r k e x a m p l e g l y p h

  • n

a r b i t r a r y p a g e g e t

  • c

c u r r e n c e s f r

  • m

a l l p a g e s

  • f

t h e p r i n t →

  • U

s e

  • c

c u r r e n c e s t

  • c

r e a t e r e a l i s t i c t r a i n i n g i m a g e s

slide-6
SLIDE 6
slide-7
SLIDE 7
slide-8
SLIDE 8
slide-9
SLIDE 9
slide-10
SLIDE 10
slide-11
SLIDE 11
slide-12
SLIDE 12
slide-13
SLIDE 13
slide-14
SLIDE 14
slide-15
SLIDE 15
slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18
slide-19
SLIDE 19
slide-20
SLIDE 20
slide-21
SLIDE 21
slide-22
SLIDE 22
slide-23
SLIDE 23
slide-24
SLIDE 24
slide-25
SLIDE 25
slide-26
SLIDE 26
slide-27
SLIDE 27
slide-28
SLIDE 28

G l y p h M i n e r : T w

  • I

n g r e d i e n t s

I n g r e d i e n t 1 : T e m p l a t e M a t c h i n g

  • F

i n d a p p r

  • x

i m a t e r e p e a t

  • c

c u r r e n c e s

  • f

a n e x a m p l e i m a g e

  • H

e r e : b l a c k

  • a

n d

  • w

h i t e ,

  • n

l y t r a n s l a t i

  • n

I n g r e d i e n t 2 : A c t i v e L e a r n i n g

  • D

i s t i n g u i s h m a t c h e s t h a t a r e s e m a n t i c a l l y c

  • r

r e c t f r

  • m

t h e r e s t

  • E

f f i c i e n t u s e r i n t e r a c t i

  • n
slide-29
SLIDE 29

P r e l i m i n a r y C a s e S t u d y

  • O

n 2

  • u

t

  • f

3 2 p a g e s f r

  • m

L a t i n N a r r e n s c h i f f

  • E

x p e r i m e n t 1 : p r e c i s i

  • n

/ r e c a l l / F 1 s c

  • r

e > 9 4 %

  • E

x p e r i m e n t 2 : G l y p h M i n e r v s . m a n u a l l y c

  • l

l e c t i n g g l y p h s b y O C R e n g i n e e r ( f

  • r

4 5 m i n u t e s e a c h )

A l e t h e i a G l y p h M i n e r d e t e c t e d

  • c

c u r r e n c e s 1 , 2 5 1 1 7 , 4 2 6 d i f f e r e n t g l y p h s 6 5 2 6

  • c

c u r e n c e s p e r g l y p h ( m e d i a n ) 7 4 9 8 g l y p h s w i t h s u p p

  • r

t > 1 2 5 2 6

slide-30
SLIDE 30

U s e r S t u d y a t p h i l t a g 2 1 6

  • P

a r t i c i p a n t s f r

  • m

D i g i t a l H u m a n i t i e s , O C R a n d l i n g u i s t i c s

  • H

a n d s

  • n

s e s s i

  • n

, f i v e p a g e s f r

  • m

G e r m a n N a r r e n s c h i f f

  • 3

m i n u t e s t

  • p

r

  • c

e s s , , , ,

slide-31
SLIDE 31

U s e r S t u d y : O u t c

  • m

e s

  • 5

9 t e m p l a t e s , 5 l a b e l s , 1 7 q u e s t i

  • n

n a i r e s

  • U

s e r e v a l u a t i

  • n

:

– e

n j

  • y

a b l e t

  • w
  • r

k w i t h

– w

  • u

l d u s e i t i n d a i l y w

  • r

k

  • R

e l i a b i l i t y :

– l

a b e l c

  • n

s i s t e n c y i s h i g h

– c

l a s s i f i e r c

  • n

s i s t e n c y i s h i g h T e m p l a t e

slide-32
SLIDE 32

C

  • n

c l u s i

  • n

& F u t u r e W

  • r

k

  • I

n c u n a b l e s a r e i n t e r e s t i n g , b u t h a r d t

  • O

C R

  • H

u m a n e f f

  • r

t i s n e c e s s a r y s m a r t i n t e r a c t i

  • n

s ! →

  • N

e e d f

  • r

g

  • d

t r a i n i n g d a t a

  • C

r

  • w

d s

  • u

r c i n g ?

slide-33
SLIDE 33
slide-34
SLIDE 34

C

  • n

c l u s i

  • n

& F u t u r e W

  • r

k

  • I

n c u n a b l e s a r e i n t e r e s t i n g , b u t h a r d f

  • r

O C R

  • H

u m a n e f f

  • r

t i s n e c e s s a r y s m a r t i n t e r a c t i

  • n

s ! →

  • N

e e d f

  • r

g

  • d

t r a i n i n g d a t a

  • C

r

  • w

d s

  • u

r c i n g ! B u t h

  • w

e x a c t l y ?

  • I

n

  • d

e p t h e v a l u a t i

  • n

w i t h T e s s e r a c t a n d O C R

  • p

u s

slide-35
SLIDE 35
slide-36
SLIDE 36

C

  • n

c l u s i

  • n

& F u t u r e W

  • r

k

  • I

n c u n a b l e s a r e i n t e r e s t i n g , b u t h a r d f

  • r

O C R

  • H

u m a n e f f

  • r

t i s n e c e s s a r y s m a r t i n t e r a c t i

  • n

s ! →

  • N

e e d f

  • r

g

  • d

t r a i n i n g d a t a

  • C

r

  • w

d s

  • u

r c i n g ! B u t h

  • w

e x a c t l y ?

  • I

n

  • d

e p t h e v a l u a t i

  • n

w i t h T e s s e r a c t a n d O C R

  • p

u s

  • C

h e c k

  • u

r d e m

  • v

i d e

  • n

Y

  • u

T u b e :