Performance Evaluation of GANs in a semi-supervised OCR Use Case - - PowerPoint PPT Presentation

performance evaluation of gans in a semi supervised ocr
SMART_READER_LITE
LIVE PREVIEW

Performance Evaluation of GANs in a semi-supervised OCR Use Case - - PowerPoint PPT Presentation

Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11 Dr. Florian Wilhelm Principal Data Scientist @ inovex Special Interests Florian Tanten Mathematical Modelling Master Thesis @ inovex


slide-1
SLIDE 1

Performance Evaluation of GANs in a semi-supervised OCR Use Case

Florian Wilhelm London, 2018-10-11

slide-2
SLIDE 2

Special Interests

  • Mathematical Modelling
  • Recommendation Systems
  • Data Science in Production
  • Python Data Stack
  • Maintainer of PyScaffold
  • Dr. Florian Wilhelm

Principal Data Scientist @ inovex

@FlorianWilhelm FlorianWilhelm florianwilhelm.info 2

Florian Tanten

Master Thesis @ inovex October 2017 - May 2018

slide-3
SLIDE 3

IT-project house for digital transformation:

  • Agile Development & Management
  • Web · UI/UX · Replatforming · Microservices
  • Mobile · Apps · Smart Devices · Robotics
  • Big Data & Business Intelligence Platforms
  • Data Science · Data Products · Search · Deep Learning
  • Data Center Automation · DevOps · Cloud · Hosting
  • Trainings & Coachings

Using technology to inspire our

  • clients. And ourselves.

inovex offices in Karlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart. www.inovex.de

slide-4
SLIDE 4

4

Agenda

  • 1. Use Case
  • 2. Text Spotting
  • 3. Data and Pipeline
  • 4. Generative Adversarial Networks
  • 5. Semi-supervised Learning
  • 6. Results
slide-5
SLIDE 5

5 https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics

Vehicle Identification Number (VIN)

Unique identifier like a fingerprint of a vehicle

serial number country security code model year assembly plant details flexible fuel vehicles manufacturer

slide-6
SLIDE 6

6

Use Case

VIN: WF0DXXGAKDEJ37385 VIN-Decoder Manufacturer: BMW Model: X3 Year: 2013-03-21 Engine power: 143 PS Equipment:

  • Xenon Lights

... Information about the car:

Spotting the vehicle identification number (VIN) in images

  • f vehicle registration documents
slide-7
SLIDE 7

7

OCR -Libraries Py PyOCR

Co Commercial so software Op Open so source to tools

slide-8
SLIDE 8

8

„VSSZZZGJZHR03G533“

???

+

OCR with Tesseract

slide-9
SLIDE 9

9

Agenda

  • 1. Use Case
  • 2. Text Spotting
  • 3. Data and Pipeline
  • 4. Generative Adversarial Networks
  • 5. Semi-supervised Learning
  • 6. Results
slide-10
SLIDE 10

Character detection & extraction Character recognition

11

Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“

Methodology in Text Spotting

Sliding Window Computer Vision Tools Others

  • Connected components
  • Stroke width transform
  • Edge detection
  • SVM
  • Learning with HOG
  • CNN
  • Region proposal
  • Hypotheses CNN pooling

Character or word

CNN CNN + RNN SVM Nearest Neighbor

High-performer current studies

CNN = Convolutional Neural Network SVM = Support Vector Machine HOG = Histogram of oriented Gradients RNN = Recurrent Neural Networks RL = Reinforcement Learning

379 Character Recognition ...

Sp Spot

  • tting =

ng = De Detection + Re Recognit itio ion

slide-11
SLIDE 11

12 https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/

Convolutional Neural Network

Max pooling with a 2x2 filter and stride = 2 Convolution with 3x3 kernel and stride = 1

slide-12
SLIDE 12

14

Agenda

  • 1. Use Case
  • 2. Data and Pipeline
  • 3. Semi-supervised Learning
  • 4. Generative Adversarial Networks
  • 5. Semi-supervised Learning
  • 6. Results
slide-13
SLIDE 13

15

Objectives

  • ~170 images of vehicle registration documents

b) Semi-supervised method a) Supervised method

  • 2. Comparison of classifiers
  • 1. Implementation of a prototype

„XLG0H200NA0A10348“

Dataset: Text Spotting

slide-14
SLIDE 14

16

End-to-End Text Spotting Pipeline

Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes)

On Only one window per character Al All windows

Non Maximum Suppression

Al All windows with characters

Region of Interest Extractor

Im Image depi picting only VIN IN X L G H 2 N A 1 4 4 3 8

slide-15
SLIDE 15

17

Small Dataset

What to do about that?

  • 1. Data Generation
  • 2. Data Augmentation
slide-16
SLIDE 16

18

Data Augmentation

Da Data augme mentation: Da Datase sets: s:

Original image labeled manually as „0“ 2 classes 36 classes

Chararacter Recognizer (36 classes)

Label: „0“

Character Detector (2 classes)

Label: „character“ Label: „no character“

slide-17
SLIDE 17

19

170 170 images of vehicle registration documents

Training set

85 85 images 85 85 images Training sets of classifiers Testing sets of classifiers Testing sets of pipeline

85 images Recognizer Detector ~ 42000 images 2 classes ~ 8000 images 36 classes ~ 42000 images 2 classes ~ 8000 images 36 classes Recognizer Detector Data Augmentation Data Augmentation Testing set

Datasets

slide-18
SLIDE 18

20

Classifiers

  • 1. Supervised Convolutional Neural Network
  • 2. Semi-supervised Generative Adversarial Network

Generator Discriminator Input Feature extraction Classification

slide-19
SLIDE 19

21

Agenda

  • 1. Use Case
  • 2. Text Spotting
  • 3. Data and Pipeline
  • 4. Generative Adversarial Networks
  • 5. Semi-supervised Learning
  • 6. Results
slide-20
SLIDE 20

22

Yann LeCun

Director of Facebook AI Research, Prof at NYU

“... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“

Ian J. Goodfellow @ Google Brain

slide-21
SLIDE 21

23

Generative Adversarial Network

Generator (G) Discriminator (D)

Goal: Generate images, which seem to be realistic Goal: Differentiate between fake and real images

slide-22
SLIDE 22

24

Generative Adversarial Network

Generator (G) Discriminator (D) Is D correct?

„D classified the generated image as 10% real“ „yes“

A B . . . 8 9 F Real images Real labeled images

slide-23
SLIDE 23

25 Goodfellow et al. (2014), Generative Adversarial Networks

Mathematical formulation

Discriminator output for real images Discriminator output for fake images Discriminator calculates likelihood [0,1] for an image being real Maximizing discriminator loss Minimizing generator loss

Ob Objective fu function Tr Training (al alternat nating ng)

slide-24
SLIDE 24

26

Example of generated images

Training images: Generated images during learning process:

slide-25
SLIDE 25

27

Agenda

  • 1. Use Case
  • 2. Text Spotting
  • 3. Data and Pipeline
  • 4. Generative Adversarial Networks
  • 5. Semi-supervised Learning
  • 6. Results
slide-26
SLIDE 26

28

Semi-supervised Learning

Supervised Learning Unsupervised Learning Semi-supervised Learning

  • Makes use of

unlabeled data

  • Combines supervised

and unsupervised learning

slide-27
SLIDE 27

29

Semi-supervised GAN for Character Detection

Real labeled images Real unlabeled images Generator Discriminator

slide-28
SLIDE 28

30

Agenda

  • 1. Use Case
  • 2. Text Spotting
  • 3. Data and Pipeline
  • 4. Generative Adversarial Networks
  • 5. Semi-supervised Learning
  • 6. Results
slide-29
SLIDE 29

31

Character Detector (2 classes)

60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained

„Character“ „No character“ Manually generated images with CAPTCHA methods

Pr Pretrai aining of

  • f D

DCNN

Size of labeled training set Accuracy

Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20

slide-30
SLIDE 30

32

Character Detector (2 classes)

60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000

DCNN DCNN pretrained Supervised GAN

Generator Discriminator Real labeled images

C C F C C F

Sup Supervi vised G GAN

Size of labeled training set Accuracy

Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20

slide-31
SLIDE 31

33

Character Detector (2 classes)

60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 DCNN DCNN pretrained Supervised GAN Semi-supervised GAN

Discriminator

C C F

Generator

F

Real labeled images

C C

Real unlabeled images

Se Semi-su supervise sed GA GAN

Size of labeled training set Accuracy

Bildschirmfoto 2018-04-24 um 17.48.20

slide-32
SLIDE 32

34

Character Recognizer (36 classes)

0,00% 10,00% 20,00% 30,00% 40,00% 50,00% 60,00% 70,00% 80,00% 90,00% 100,00% 36 72 108 200 300 400 600 800 1000 5000 8000

60,00% 70,00% 80,00% 90,00% 100,00% 2 5 1 2 4 7 1 5 1 5 3 4 2 DCNN DCNN pretrained Supervised GAN

Ch Character Detector Character Recogniz izer Size of labeled training set Accuracy

Size of labeled training set Accuracy

Bildschirmfoto 2018-04-24 um 17.48.20

slide-33
SLIDE 33

. .

35

End-to-End Text Spotting Pipeline

Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes) Non Maximum Suppression Region of Interest Extractor

Accuracy = 99.94%

85 images

1. 2. 85.

.

slide-34
SLIDE 34

36

Google Cloud Vision API

Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes) Non Maximum Suppression Region of Interest Extractor 85 images

∅ Levenshtein distance = 4.49

85 images of VINs

. . .

Our Approach Google Cloud Vision API vs.

∅ Levenshtein distance = 0.011 Levenshtein distance:

Classification Label

AYZ33 XYZ321 = 3

slide-35
SLIDE 35

37

Key Learnings

  • Custom solutions can tremendously outperform
  • ff-the-shelve software in a specific use-case
  • Semi-supervised GANs can be successfully

applied in use-cases with little data

  • With simple data augmentation techniques

having only little data might be enough

slide-36
SLIDE 36

38

Bibliography

  • Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“
  • Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“
  • Girshick et al. (2015), „Fast R-CNN“
  • Girshick et al. (2015), „Faster R-CNN“
  • He et al. (2017), „Mask-R-CNN“
  • Goodfellow et al. (2014) „Generative Adversarial Networks"
slide-37
SLIDE 37

Thank you!

Florian Wilhelm Principal Data Scientist inovex GmbH Schanzenstraße 6-20 Kupferhütte 1.13 51063 Köln florian.wilhelm@inovex.de