Performance Evaluation of GANs in a semi-supervised OCR Use Case - PowerPoint PPT Presentation

Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11

Dr. Florian Wilhelm Principal Data Scientist @ inovex Special Interests Florian Tanten • Mathematical Modelling Master Thesis @ inovex October 2017 - May 2018 @FlorianWilhelm • Recommendation Systems � FlorianWilhelm • Data Science in Production florianwilhelm.info • Python Data Stack • Maintainer of PyScaffold 2

IT-project house for digital transformation: inovex offices in Karlsruhe · Cologne · Munich · ‣ Agile Development & Management Pforzheim · Hamburg · Stuttgart. ‣ Web · UI/UX · Replatforming · Microservices ‣ Mobile · Apps · Smart Devices · Robotics www.inovex.de ‣ Big Data & Business Intelligence Platforms ‣ Data Science · Data Products · Search · Deep Learning Using technology to inspire our clients. And ourselves . ‣ Data Center Automation · DevOps · Cloud · Hosting ‣ Trainings & Coachings

Agenda 1. Use Case 2. Text Spotting 3. Data and Pipeline 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 4

Vehicle Identification Number (VIN) flexible fuel vehicles serial number Unique identifier like a fingerprint of a vehicle manufacturer assembly plant model year security code country details 5 https://www.autocheck.com/vehiclehistory/autocheck/en/vinbasics

Use Case Spotting the vehicle identification number (VIN) in images of vehicle registration documents Information about the car: VIN-Decoder Manufacturer: BMW Model: X3 Year: 2013-03-21 VIN: Engine power: 143 PS WF0DXXGAKDEJ37385 Equipment: - Xenon Lights ... 6

OCR -Libraries Op Open so source to tools Co Commercial so software Py PyOCR 7

OCR with Tesseract „VSSZZZGJZHR03G533“ + ??? 8

Methodology in Text Spotting CNN = Convolutional Neural Network SVM = Support Vector Machine HOG = Histogram of oriented Gradients RNN = Recurrent Neural Networks RL = Reinforcement Learning Character 379 Recognition Spot Sp otting = ng = De Detection + Re Recognit itio ion Character detection & extraction Character recognition SVM - Connected components Computer Vision - Stroke width transform Tools - Edge detection Nearest Neighbor Character or word - SVM - Learning with HOG CNN Sliding Window - CNN High-performer current studies CNN + RNN - Region proposal Others - Hypotheses CNN pooling ... Girshick et al. (2014), „Region-Based Convolutional Networks for Accurate Object Detection and Segmentation“ 11

Convolutional Neural Network Convolution with 3x3 kernel and stride = 1 Max pooling with a 2x2 filter and stride = 2 12 https://en.wikipedia.org/wiki/Convolutional_neural_network; http://intellabs.github.io/ParallelJavaScript/

Agenda 1. Use Case 2. Data and Pipeline 3. Semi-supervised Learning 4. Generative Adversarial Networks 5. Semi-supervised Learning 6. Results 14

Objectives - ~170 images of vehicle registration documents Dataset: Text 1. Implementation of a prototype „XLG0H200NA0A10348“ Spotting a) Supervised method 2. Comparison of classifiers b) Semi-supervised method 15

End-to-End Text Spotting Pipeline Region of Interest Extractor Im Image depi picting only VIN IN Sliding window Al All windows Character Detector (2 classes) All windows with characters Al Non Maximum Suppression On Only one window per character Chararacter Recognizer (36 classes) X L G 0 H 2 0 0 N A 0 4 1 0 3 4 8 16

Small Dataset What to do about that? 1. Data Generation 2. Data Augmentation 17

Data Augmentation Original image labeled manually as „0“ Chararacter Recognizer (36 classes) Character Detector (2 classes) Label: „character“ Label: „0“ Da Data augme mentation: Label: „no character“ Da Datase sets: s: 36 classes 2 classes 18

Datasets 170 170 images of vehicle registration documents 85 images 85 images 85 85 Training set Testing set Data Augmentation Data Augmentation Detector Recognizer Detector Recognizer ~ 42000 images ~ 8000 images ~ 42000 images ~ 8000 images 85 images 2 classes 36 classes 2 classes 36 classes Training sets of classifiers Testing sets of classifiers Testing sets of pipeline 19

Classifiers 1. Supervised Convolutional Neural Network Classification Input Feature extraction 2. Semi-supervised Generative Adversarial Network Generator Discriminator 20

Yann LeCun Director of Facebook AI Research, Prof at NYU “... (GANs) and the variations that are now being proposed is the most interesting idea in the last 10 years in ML, in my opinion.“ Ian J. Goodfellow @ Google Brain 22

Generative Adversarial Network Generator (G) Discriminator (D) Generate images, which seem to Goal: Differentiate between fake and real Goal: be realistic images 23

Generative Adversarial Network Real labeled images Real images „yes“ A B . . Is D Discriminator (D) . correct? 8 9 F Generator (G) „D classified the generated image as 10% real“ 24

Mathematical formulation Ob Objective fu function Discriminator calculates likelihood [0,1] for an image being real Discriminator output Discriminator output for real images for fake images Tr Training (al alternat nating ng) Maximizing discriminator loss Minimizing generator loss 25 Goodfellow et al. (2014), Generative Adversarial Networks

Example of generated images Training images: Generated images during learning process: 26

Semi-supervised Learning Makes use of • unlabeled data Unsupervised Supervised Learning Learning Combines supervised • and unsupervised learning Semi-supervised Learning 28

Semi-supervised GAN for Character Detection Real labeled images Real unlabeled Discriminator images Generator 29

Character Detector (2 classes) Pr Pretrai aining of of D DCNN 100,00% Manually generated images with CAPTCHA methods 90,00% Accuracy 80,00% „Character“ „No character“ 70,00% 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 Size of labeled training set Bildschirmfoto 2018-04-24 um 17.48.20 Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained 31

Character Detector (2 classes) Supervi Sup vised G GAN 100,00% C C 90,00% Accuracy Real labeled 80,00% images C C Discriminator F 70,00% Generator 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 F Size of labeled training set Bildschirmfoto 2018-04-24 um Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained Supervised GAN 17.48.20 32

Character Detector (2 classes) Semi-su Se supervise sed GA GAN 100,00% C C 90,00% Real labeled Accuracy images 80,00% C C Discriminator F Real unlabeled 70,00% images Generator 60,00% 20 50 100 200 400 700 1000 5000 15000 30000 42000 Size of labeled training set F Bildschirmfoto 2018-04-24 um 17.48.20 DCNN DCNN pretrained Supervised GAN Semi-supervised GAN 33

Character Recognizer (36 classes) Character Recogniz izer Character Detector Ch 100,00% 100,00% 90,00% 90,00% 80,00% Accuracy 80,00% 70,00% Accuracy 60,00% 70,00% 50,00% 60,00% 0 0 0 0 0 0 0 0 0 0 0 2 5 0 0 0 0 0 0 0 0 0 1 2 4 7 0 0 0 0 0 40,00% 1 5 5 0 2 1 3 4 Size of labeled training set 30,00% DCNN DCNN pretrained Supervised GAN 20,00% 10,00% 0,00% 36 72 108 200 300 400 600 800 1000 5000 8000 Size of labeled training set Bildschirmfoto 2018-04-24 um 17.48.20 34

End-to-End Text Spotting Pipeline 85 images Region of Interest Extractor Sliding window Character Detector (2 classes) Non Maximum Suppression Chararacter Recognizer (36 classes) 1. . 2. . Accuracy = 99.94% . 85. 35

Google Cloud Vision API vs. Our Approach . . . 85 images Region of Interest Extractor 85 images of VINs Sliding window Character Detector (2 classes) Google Cloud Vision API Non Maximum Suppression Chararacter Recognizer (36 classes) Levenshtein distance: ∅ Levenshtein distance = 4.49 Classification Label ∅ Levenshtein distance = 0.011 AYZ33 XYZ321 = 3 36

Key Learnings Custom solutions can tremendously outperform • off-the-shelve software in a specific use-case Semi-supervised GANs can be successfully • applied in use-cases with little data With simple data augmentation techniques • having only little data might be enough 37

Performance Evaluation of GANs in a semi-supervised OCR Use Case - PowerPoint PPT Presentation

Performance Evaluation of GANs in a semi-supervised OCR Use Case Florian Wilhelm London, 2018-10-11 Dr. Florian Wilhelm Principal Data Scientist @ inovex Special Interests Florian Tanten Mathematical Modelling Master Thesis @ inovex

Process for OCR Audit and Remediation What is an OCR Complaint? How do I resolve an OCR

OCR for CJK Mark Ravina CEAL Technology Forum 2018 I am an OCR end-user, not an OCR developer

ABBYY Fi ABBYY Fi ABBYY FineReader ABBYY FineReader R R d d OCR and PDF Conversion OCR and

M-Files OCR Presented By: Syed Raza What is OCR? OCR - Optical Character Recognition

Entropic GANs meet VAEs: A Statistical Approach to Compute Sample Likelihoods in GANs Yogesh

GANs for Word Embeddings Akshay Budhkar and Krishnapriya Introduction GANs have shown incredible

Margin-based Semi-supervised Learning Using Apollonius circle MONA EMADI AND JAFAR TANHA T TC S

What Does OCR Do? OCR enforces several civil rights laws. These laws prohibit discrimination on

OCR Level 2 ITQ - Unit 59 - Presentation Software Using OCR Level 2 ITQ - Unit 59 - Presentation

OCR Level 1 ITQ - Unit 58 - Presentation Software Using OCR Level 1 ITQ - Unit 58 - Presentation

Generative Adversarial Networks (GANs) By: Ismail Elezi ismail.elezi@gmail.com Supervised

Semi-Supervised Kernel Mean Shift Clustering A Semi-Supervised Clustering Approach Motivation:

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Support Vector Machines (SVMs). Semi-Supervised Learning. Semi-Supervised SVMs.

Semi-Supervised Learning Maria-Florina Balcan 03/30/2015 Readings: Semi-Supervised Learning.

CS330 Paper Presentation: October 16th, 2019 Supervised Classification Semi-Supervised

Review SketchNet: Sketch Classification with Web Images [CVPR `16] (Speaker. Doheon Lee)

Adversarial network for natural language systhesis Speaker introduction Rajib Biswas Area of

Advanced Methods for Mass Flux Characterization in NAPL Zones Presented by Michael J. Gefell

INVESTIGATION OF AXIALLY FLOWING He/O 2 PLASMAS FOR OXYGEN-IODINE LASERS * D. Shane Stafford a and

Municipal Court 2016-17 Budget Presentation May 11, 2016 City of Independence, Missouri

Brief notes on neutrino oscillations and latest T2K results Pablo Fernndez de Salas Oskar

Timing in a FLASH Matthias Hoek 55 th International Winter Meeting on Nuclear Physics 23-27

Portable Watering Device Group 9 Chris Havekost | CpE Joan Henriquez | CpE Peter Nachtigal |