Learning from Limited Data
The Univ. of Tokyo / RIKEN AIP Tatsuya Harada
GTC March 29, 2018
Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya - - PowerPoint PPT Presentation
GTC March 29, 2018 Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya Harada Contents Background Deep Learning (DL) is one of the most successful machine learning methods. DL generally requires a huge amount of
The Univ. of Tokyo / RIKEN AIP Tatsuya Harada
GTC March 29, 2018
Background
Deep Learning (DL) is one of the most successful machine learning methods. DL generally requires a huge amount of annotated data. Annotation cost is very expensive.
Challenge
Obtaining High Quality Deep Neural Networks from limited data
Topics
Learning method for supervised learning from limited data Unsupervised domain adaptation using classifier discrepancy
Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada Learning from Between-class Examples for Deep Sound Recognition To appear ICLR 2018 Between-class Learning for Image Classification To appear CVPR 2018
Learning from Limited Data
https://github.com/mil-tokyo/bc_learning_sound https://github.com/mil-tokyo/bc_learning_image
4
Training Dataset Dog Cat Label Random Select & Augment Bird Model Output Dog 1 Cat 0 Bird 0 Input
1. Select one example from training dataset 2. Train the model to output 1 for the corresponding class and 0 for the other classes
5 1. Select two training examples from different classes 2. Mix those examples with a random ratio 3. Train the model to output the mixing ratio and mixing classes
Proposed method
Training Dataset Dog Cat Label Random Select & Augment Bird Model Output KL Dog 0.7 Cat 0.3 Bird 0 Input
On test phase, we input a single example into the network.
Generate infinite training data from limited data Learn more discriminative feature space than standard learning Merits
0.7 0.3
Two training examples Random ratio sounds
Dog: 1 Cat: 0 Bird: 0 Dog: 0 Cat: 1 Bird: 0
labels
Dog: Cat: Bird: 0
6
𝐻, 𝐻: sound pressure level of 𝒚, 𝒚[dB] A dog and a cat
① Various models ② Various datasets ③ Compatible with strong data augmentation ④ Surpass the human level
7 We can improve recognition performance for any sound networks, if we apply the BC learning.
static component wave component
would not be important or even have a bad effect if CNNs treat input data as waveforms
Proposal 1
8
Dog 0.5 Cat 0.5 Cat 1.0 Dog 1.0
Images as waveforms
Proposal 2 (BC+)
9
Our preliminary results were presented in ILSVRC2017 on July 26, 2017.
100 epochs Standard 20.4/5.3 [28] BC (ours) 19.92/4.91 150 epochs Standard 20.44/5.25 BC (ours) 19.43/4.80 top-1/top-5 val. error
around 1% gain in top-1 error
10
Our preliminary results were presented in ILSVRC2017 on July 26, 2017.
11
Class A distribution Class B distribution rA+(1-r)B distribution
Small Fisher’s criterion → Overlap among distributions → Large BC learning loss Large Fisher’s criterion → No overlap among distributions → Small BC learning loss
Class A distribution Class B distribution rA+(1-r)B distribution
More discriminative Less discriminative
Large correlation among classes → Mixing class of A and B may be classified into class C. → Large BC learning loss
A B C
Decision boundary
rA+(1-r)B A B C
Decision boundary
rA+(1-r)B
Small correlation among classes → Mixing class of A and B is not classified into class C. → Small BC learning loss
Small correlation Large correlation In the classification, the distributions must be uncorrelated because the teaching signal is discrete.
Standard learning BC learning (ours)
Activations of ・10-th layer of 11-layers CNN ・trained on CIFAR-10
13
Distributions are more compact than those from standard learning. Distributions are spherical. Larger Fisher’s criterion than that of standard learning
Fisher’s criterion: 1.97 Fisher’s criterion: 1.76
Adversarial Dropout Regularization Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko To apper ICLR 2018 Maximum Classifier Discrepancy for Unsupervised Domain Adaptation Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada To apper CVPR 2018, oral presentation
Learning from Limited Data
Problems
Supervised learning model needs many labeled examples Cost to collect them in various domains
Goal
Transfer knowledge from source to target domain Classifier that works well on target domain.
Unsupervised Domain Adaptation (UDA)
Labeled examples are given only in the source domain. There are no labeled examples in the target domain.
Source domain Target domain Synthetic images, labeled Real images, unlabeled
Distribution matching based method
Problems
Feature Extractor Category Classifier Source (labeled) Target (unlabeled) T S Domain Classifier
Source Target Source Target Before adaptation Adapted
Decision boundary
Considering class specific distributions Using decision boundary to align distributions
Source Target Source Target Source Target Source Target
Proposed
Before adaptation Adapted
Previous work
Decision boundary Decision boundary
Class A Class B
Decision boundary
Source Target
F1 F2
Source Target
F1 F2
Source Target
F1 F2 Maximize discrepancy by learning classifiers Minimize discrepancy by learning feature space Maximize discrepancy by learning classifiers
Source Target
F1 F2 Minimize discrepancy by learning feature space
Discrepancy Maximizing discrepancy by learning two classifiers Minimizing discrepancy by learning feature space Discrepancy
Discrepancy is the example which gets different predictions from two different classifiers.
1 2
Input
F1 F2
1 2
L1class
Classifiers
L2class
Loss
Maximize D by learning classifier Minimize D by learning feature generator
Source Target F1 F2 Source Target F1 F2
, 𝐺 that maximize 𝑬 − (𝑴𝟐 + 𝑴𝟑)
, 𝐺 that minimize 𝑴𝟐 + 𝑴𝟑 (minimize classification error on source)
Fix classifiers 𝐺
, 𝐺, and find feature generator 𝐻 that minimizes 𝑬
Algorithm
20
Hypothesis Expected error in target domain Expected error in source domain Shared error of the ideal hypothesis
Synthetic images to Real images (12 Classes) Finetune pre-trained ResNet101 [He et al., CVPR 2016] (ImageNet) Source:images, Target:images
Source (Synthetic images) Target (Real images)
Simulated Image (GTA5) to Real Image (CityScape) Finetuning of pre-trained VGG, Dilated Residual Network [Yu et al., 2017] (ImageNet)
Calculate discrepancy pixel-wise
Evaluation by mean IoU (TP/(TP+FP+FN))
GTA 5 (Source) CityScape(Target)
10 20 30 40 50 60 70 80 90 100
road sdwk bldng wall fence pole light sign vg n trrn sky perso rider car truck bus train mcycl bcycl
source only
IoU
RGB Ground truth Source
Adapted (ours)
Between-class learning (BC learning)
Mix two training examples with a random ratio Train the model to output the mixing ratio Simple and easy to implement Can be introduced independently from previous techniques: network architectures, data augmentation schemes, optimizers, etc.
Unsupervised Domain Adaptation
Unsupervised domain adaptation method using classifier discrepancy is useful.