Efficient Binarization for Historical Document Analysis Florian - - PowerPoint PPT Presentation

efficient binarization for historical document analysis
SMART_READER_LITE
LIVE PREVIEW

Efficient Binarization for Historical Document Analysis Florian - - PowerPoint PPT Presentation

Efficient Binarization for Historical Document Analysis Florian Westphal H akan Grahn Niklas Lavesson Blekinge Institute of Technology Karlskrona, Sweden flw@bth.se 2016-02-02 F. Westphal, H. Grahn, N. Lavesson (BTH) Efficient


slide-1
SLIDE 1

Efficient Binarization for Historical Document Analysis

Florian Westphal H˚ akan Grahn Niklas Lavesson

Blekinge Institute of Technology Karlskrona, Sweden flw@bth.se

2016-02-02

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 1 / 20

slide-2
SLIDE 2

Outline

1

Document Readability

2

Howe’s Binarization Algorithm

3

Heterogenous Computing

4

Binarization Pipeline

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 2 / 20

slide-3
SLIDE 3

BTH & ArkivDigital

Swedish university, established in 1989 Over 6000 registered students BigData@BTH Swedish company, established in 2004 Provides access to almost 60 million images Church books, court records, military records, census records, . . .

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 3 / 20

slide-4
SLIDE 4

Document Readability

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 4 / 20

slide-5
SLIDE 5

Approach

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 5 / 20

slide-6
SLIDE 6

Approach

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 5 / 20

slide-7
SLIDE 7

Approach

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 6 / 20

slide-8
SLIDE 8

Approach - Demo

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 7 / 20

slide-9
SLIDE 9

Howe’s Binarization Algorithm

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 8 / 20

slide-10
SLIDE 10

Howe’s Binarization Algorithm (Cont.)

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 9 / 20

slide-11
SLIDE 11

Heterogenous Computing

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 10 / 20

slide-12
SLIDE 12

Binarization Pipeline

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 11 / 20

slide-13
SLIDE 13

Binarization Pipeline (Cont.)

I II III IV V VI VII VIII 1 CPU GPU CPU GPU CPU GPU CPU GPU 2 CPU CPU GPU GPU CPU CPU GPU GPU 3 CPU CPU CPU CPU GPU GPU GPU GPU

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 12 / 20

slide-14
SLIDE 14

Preliminary Results

Reference Implementation Configuration IV

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 13 / 20

slide-15
SLIDE 15

Preliminary Results (Cont.)

Reference Implementation Configuration VIII

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 14 / 20

slide-16
SLIDE 16

Preliminary Results - Binarization Performance

75 80 85 90 95 100 * * C * * G * * C * * G 2 4 6 8 10 12 Pseudo F-Measure in % DRD

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 15 / 20

slide-17
SLIDE 17

Preliminary Results - Time

H-DIBCO 2014 Benchmark

5 6 7 8 9 10 11 I ( C C C ) I I ( G C C ) I I I ( C G C ) I V ( G G C ) V ( C C G ) V I ( G C G ) V I I ( C G G ) V I I I ( G G G ) Time in Seconds Own Reference

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 16 / 20

slide-18
SLIDE 18

Preliminary Results - Time (cont.)

High Resolution Image

10 15 20 25 30 35 40 I ( C C C ) I I ( G C C ) I I I ( C G C ) I V ( G G C ) V ( C C G ) V I ( G C G ) V I I ( C G G ) V I I I ( G G G ) Time in Seconds Own Reference

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 17 / 20

slide-19
SLIDE 19

Preliminary Results - Time per Step

Time taken in each binarization step for the used high resolution image. 1 2 3 CPU 2.27 s 0.17 s 28.76 s GPU 0.39 s 0.11 s 14.54 s

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 18 / 20

slide-20
SLIDE 20

Next Steps

Revision of the implementation Implementation of the binarization pipeline

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 19 / 20

slide-21
SLIDE 21

Acknowledgements

We would like to thank ArkivDigital for providing us with access to their image database. This work is part of the research project ”Scalable resource-efficient systems for big data analytics” funded by the Knowledge Foundation (grant: 20140032) in Sweden.

  • F. Westphal, H. Grahn, N. Lavesson (BTH)

Efficient Binarization 2016-02-02 20 / 20