Performance Evaluation of GANs in a semi-supervised OCR Use Case - - PowerPoint PPT Presentation
Performance Evaluation of GANs in a semi-supervised OCR Use Case - - PowerPoint PPT Presentation
Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11 Dr. Florian Wilhelm Principal Data Scientist @ inovex Special Interests Florian Tanten Mathematical Modelling Master Thesis @ inovex
Special Interests
- Mathematical Modelling
- Recommendation Systems
- Data Science in Production
- Python Data Stack
- Maintainer of PyScaffold
- Dr. Florian Wilhelm
Principal Data Scientist @ inovex
@FlorianWilhelm FlorianWilhelm florianwilhelm.info 2
Florian Tanten
Master Thesis @ inovex October 2017 - May 2018
IT-project house for digital transformation:
- Agile Development & Management
- Web · UI/UX · Replatforming · Microservices
- Mobile · Apps · Smart Devices · Robotics
- Big Data & Business Intelligence Platforms
- Data Science · Data Products · Search · Deep Learning
- Data Center Automation · DevOps · Cloud · Hosting
- Trainings & Coachings
Using technology to inspire our
- clients. And ourselves.
inovex offices in Karlsruhe · Cologne · Munich · Pforzheim · Hamburg · Stuttgart. www.inovex.de
4
Agenda
- 1. Use Case
- 2. Text Spotting
- 3. Data and Pipeline
- 4. Generative Adversarial Networks
- 5. Semi-supervised Learning
- 6. Results
5 https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics
Vehicle Identification Number (VIN)
Unique identifier like a fingerprint of a vehicle
serial number country security code model year assembly plant details flexible fuel vehicles manufacturer
6
Use Case
VIN: WF0DXXGAKDEJ37385 VIN-Decoder Manufacturer: BMW Model: X3 Year: 2013-03-21 Engine power: 143 PS Equipment:
- Xenon Lights
... Information about the car:
Spotting the vehicle identification number (VIN) in images
- f vehicle registration documents
7
OCR -Libraries Py PyOCR
Co Commercial so software Op Open so source to tools
8
„VSSZZZGJZHR03G533“
???
+
OCR with Tesseract
9
Agenda
- 1. Use Case
- 2. Text Spotting
- 3. Data and Pipeline
- 4. Generative Adversarial Networks
- 5. Semi-supervised Learning
- 6. Results
Character detection & extraction Character recognition
11
Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“
Methodology in Text Spotting
Sliding Window Computer Vision Tools Others
- Connected components
- Stroke width transform
- Edge detection
- SVM
- Learning with HOG
- CNN
- Region proposal
- Hypotheses CNN pooling
Character or word
CNN CNN + RNN SVM Nearest Neighbor
High-performer current studies
CNN = Convolutional Neural Network SVM = Support Vector Machine HOG = Histogram of oriented Gradients RNN = Recurrent Neural Networks RL = Reinforcement Learning
379 Character Recognition ...
Sp Spot
- tting =
ng = De Detection + Re Recognit itio ion
12 https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/
Convolutional Neural Network
Max pooling with a 2x2 filter and stride = 2 Convolution with 3x3 kernel and stride = 1
14
Agenda
- 1. Use Case
- 2. Data and Pipeline
- 3. Semi-supervised Learning
- 4. Generative Adversarial Networks
- 5. Semi-supervised Learning
- 6. Results
15
Objectives
- ~170 images of vehicle registration documents
b) Semi-supervised method a) Supervised method
- 2. Comparison of classifiers
- 1. Implementation of a prototype
„XLG0H200NA0A10348“
Dataset: Text Spotting
16
End-to-End Text Spotting Pipeline
Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes)
On Only one window per character Al All windows
Non Maximum Suppression
Al All windows with characters
Region of Interest Extractor
Im Image depi picting only VIN IN X L G H 2 N A 1 4 4 3 8
17
Small Dataset
What to do about that?
- 1. Data Generation
- 2. Data Augmentation
18
Data Augmentation
Da Data augme mentation: Da Datase sets: s:
Original image labeled manually as „0“ 2 classes 36 classes
Chararacter Recognizer (36 classes)
Label: „0“
Character Detector (2 classes)
Label: „character“ Label: „no character“
19
170 170 images of vehicle registration documents
Training set
85 85 images 85 85 images Training sets of classifiers Testing sets of classifiers Testing sets of pipeline
85 images Recognizer Detector ~ 42000 images 2 classes ~ 8000 images 36 classes ~ 42000 images 2 classes ~ 8000 images 36 classes Recognizer Detector Data Augmentation Data Augmentation Testing set
Datasets
20
Classifiers
- 1. Supervised Convolutional Neural Network
- 2. Semi-supervised Generative Adversarial Network
Generator Discriminator Input Feature extraction Classification
21
Agenda
- 1. Use Case
- 2. Text Spotting
- 3. Data and Pipeline
- 4. Generative Adversarial Networks
- 5. Semi-supervised Learning
- 6. Results
22
Yann LeCun
Director of Facebook AI Research, Prof at NYU
“... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“
Ian J. Goodfellow @ Google Brain
23
Generative Adversarial Network
Generator (G) Discriminator (D)
Goal: Generate images, which seem to be realistic Goal: Differentiate between fake and real images
24
Generative Adversarial Network
Generator (G) Discriminator (D) Is D correct?
„D classified the generated image as 10% real“ „yes“
A B . . . 8 9 F Real images Real labeled images
25 Goodfellow et al. (2014), Generative Adversarial Networks
Mathematical formulation
Discriminator output for real images Discriminator output for fake images Discriminator calculates likelihood [0,1] for an image being real Maximizing discriminator loss Minimizing generator loss
Ob Objective fu function Tr Training (al alternat nating ng)
26
Example of generated images
Training images: Generated images during learning process:
27
Agenda
- 1. Use Case
- 2. Text Spotting
- 3. Data and Pipeline
- 4. Generative Adversarial Networks
- 5. Semi-supervised Learning
- 6. Results
28
Semi-supervised Learning
Supervised Learning Unsupervised Learning Semi-supervised Learning
- Makes use of
unlabeled data
- Combines supervised
and unsupervised learning
29
Semi-supervised GAN for Character Detection
Real labeled images Real unlabeled images Generator Discriminator
30
Agenda
- 1. Use Case
- 2. Text Spotting
- 3. Data and Pipeline
- 4. Generative Adversarial Networks
- 5. Semi-supervised Learning
- 6. Results
31
Character Detector (2 classes)
60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained
„Character“ „No character“ Manually generated images with CAPTCHA methods
Pr Pretrai aining of
- f D
DCNN
Size of labeled training set Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20
32
Character Detector (2 classes)
60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000
DCNN DCNN pretrained Supervised GAN
Generator Discriminator Real labeled images
C C F C C F
Sup Supervi vised G GAN
Size of labeled training set Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20
33
Character Detector (2 classes)
60,00% 70,00% 80,00% 90,00% 100,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 DCNN DCNN pretrained Supervised GAN Semi-supervised GAN
Discriminator
C C F
Generator
F
Real labeled images
C C
Real unlabeled images
Se Semi-su supervise sed GA GAN
Size of labeled training set Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20
34
Character Recognizer (36 classes)
0,00% 10,00% 20,00% 30,00% 40,00% 50,00% 60,00% 70,00% 80,00% 90,00% 100,00% 36 72 108 200 300 400 600 800 1000 5000 8000
60,00% 70,00% 80,00% 90,00% 100,00% 2 5 1 2 4 7 1 5 1 5 3 4 2 DCNN DCNN pretrained Supervised GAN
Ch Character Detector Character Recogniz izer Size of labeled training set Accuracy
Size of labeled training set Accuracy
Bildschirmfoto 2018-04-24 um 17.48.20
. .
35
End-to-End Text Spotting Pipeline
Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes) Non Maximum Suppression Region of Interest Extractor
Accuracy = 99.94%
85 images
1. 2. 85.
.
36
Google Cloud Vision API
Sliding window Character Detector (2 classes) Chararacter Recognizer (36 classes) Non Maximum Suppression Region of Interest Extractor 85 images
∅ Levenshtein distance = 4.49
85 images of VINs
. . .
Our Approach Google Cloud Vision API vs.
∅ Levenshtein distance = 0.011 Levenshtein distance:
Classification Label
AYZ33 XYZ321 = 3
37
Key Learnings
- Custom solutions can tremendously outperform
- ff-the-shelve software in a specific use-case
- Semi-supervised GANs can be successfully
applied in use-cases with little data
- With simple data augmentation techniques
having only little data might be enough
38
Bibliography
- Krizhevsky et al. (2012) „ImageNet Classication with Deep Convolutional Neural Networks“
- Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“
- Girshick et al. (2015), „Fast R-CNN“
- Girshick et al. (2015), „Faster R-CNN“
- He et al. (2017), „Mask-R-CNN“
- Goodfellow et al. (2014) „Generative Adversarial Networks"