Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya - PowerPoint PPT Presentation

GTC March 29, 2018 Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya Harada

Contents  Background  Deep Learning (DL) is one of the most successful machine learning methods.  DL generally requires a huge amount of annotated data.  Annotation cost is very expensive.  Challenge  Obtaining High Quality Deep Neural Networks from limited data  Topics  Learning method for supervised learning from limited data  Unsupervised domain adaptation using classifier discrepancy

Learning from Limited Data Y. Tokozume Between-class Learning Yuji Tokozume, Yoshitaka Ushiku, Tatsuya Harada Learning from Between-class Examples for Deep Sound Recognition To appear ICLR 2018 Between-class Learning for Image Classification To appear CVPR 2018 https://github.com/mil-tokyo/bc_learning_sound https://github.com/mil-tokyo/bc_learning_image

4 Standard Supervised Learning 1. Select one example from training dataset 2. Train the model to output 1 for the corresponding class and 0 for the other classes Random Select & Augment Dog 1 Cat 0 Dog Bird 0 Input Bird Output Label Cat Model Training Dataset

Between-class (BC) Learning 5 Proposed method 1. Select two training examples from different classes On test phase, we input a single example into the network. 2. Mix those examples with a random ratio 3. Train the model to output the mixing ratio and mixing classes Random Select & Augment 0.7 Dog 0.7 KL Cat 0.3 Dog Bird 0 Bird Input Output 0.3 Label Cat Model Training Dataset Merits  Generate infinite training data from limited data  Learn more discriminative feature space than standard learning

BC learning for sounds 6  Two training examples A dog and a cat  Random ratio Dog: 1 Dog: 0 Dog: Cat: Cat: 0 Cat: 1 labels Bird: 0 Bird: 0 Bird: 0 sounds 𝐻 � , 𝐻 � : sound pressure level of 𝒚 � , 𝒚 � [dB]

Results of Sound Recognition 7 ② Various datasets ① Various models ③ Compatible with strong data augmentation ④ Surpass the human level We can improve recognition performance for any sound networks, if we apply the BC learning.

BC Learning for Image would not be important or even 8 have a bad effect if CNNs treat Images as waveforms input data as waveforms Proposal 1 static component wave component Dog 0.5 Dog 1.0 Cat 1.0 Cat 0.5 Proposal 2 (BC+)

9 Results on CIFAR Our preliminary results were presented in ILSVRC2017 on July 26, 2017.

10 Results on ImageNet-1K Our preliminary results were presented in ILSVRC2017 on July 26, 2017. top-1/top-5 val. error Standard 20.4/5.3 [28] 100 epochs BC (ours) 19.92/4.91 Standard 20.44/5.25 150 epochs BC (ours) 19.43/4.80 around 1% gain in top-1 error

How BC Learning Works 11 Less discriminative More discriminative Class A Class A distribution distribution rA+(1-r)B rA+(1-r)B distribution distribution Class B Class B distribution distribution Small Fisher’s criterion Large Fisher’s criterion → Overlap among distributions → No overlap among distributions → Large BC learning loss → Small BC learning loss

How BC Learning Works In the classification, the distributions must be uncorrelated because the teaching signal is discrete. Small correlation Large correlation A C C A rA+(1-r)B rA+(1-r)B B B Decision Decision boundary boundary Large correlation among classes Small correlation among classes → Mixing class of A and B may → Mixing class of A and B is be classified into class C. not classified into class C. → Large BC learning loss → Small BC learning loss

Visualization using PCA 13 Fisher’s criterion: 1.97 Fisher’s criterion: 1.76 Activations of ・ 10-th layer of 11-layers CNN ・ trained on CIFAR-10 Standard learning BC learning (ours)  Distributions are more compact than those from standard learning.  Distributions are spherical.  Larger Fisher’s criterion than that of standard learning

Learning from Limited Data K. Saito Unsupervised Domain Adaptation using Classifier Discrepancy Adversarial Dropout Regularization Kuniaki Saito, Yoshitaka Ushiku, Tatsuya Harada, Kate Saenko To apper ICLR 2018 Maximum Classifier Discrepancy for Unsupervised Domain Adaptation Kuniaki Saito, Kohei Watanabe, Yoshitaka Ushiku, Tatsuya Harada To apper CVPR 2018, oral presentation

Domain Adaptation (DA)  Problems  Supervised learning model needs many labeled examples  Cost to collect them in various domains  Goal  Transfer knowledge from source to target domain  Classifier that works well on target domain.  Unsupervised Domain Adaptation (UDA)  Labeled examples are given only in the source domain.  There are no labeled examples in the target domain. Target domain Source domain Real images, unlabeled Synthetic images, labeled

Related Work  Distribution matching based method • Match distributions of source and target features • Domain Classifier (GAN) [Ganin et al., 2015] • Maximum Mean Discrepancy [Long et al., 2015]  Problems • Features are aligned just by looking hidden features. • Relationship between the decision boundary and target examples is not considered. • This method only considers whole distribution. Before adaptation Adapted T Target Source (unlabeled) Source Domain Classifier Feature Extractor S Source Category (labeled) Classifier Decision boundary Target Target

Proposed Approach  Considering class specific distributions  Using decision boundary to align distributions Proposed Before adaptation Adapted Source Source Source Decision Class A boundary Target Source Class B Previous work Decision Decision Target Target Target boundary boundary

Key Idea  Maximizing discrepancy by learning two classifiers  Minimizing discrepancy by learning feature space Maximize discrepancy Minimize discrepancy Maximize discrepancy Minimize discrepancy by learning classifiers by learning feature space by learning classifiers by learning feature space Source Source Source Source F 1 F 1 F 1 F 1 F 2 F 2 F 2 F 2 Target Target Target Target Discrepancy is the example which gets different Discrepancy Discrepancy predictions from two different classifiers.

Network Architecture and Training Loss Input L1 class 1 F 1 Classifiers 1 2 F 2 L2 class 2 Algorithm 1. Fix generator 𝐻 , and find classifiers 𝐺 � , 𝐺 � that maximize 𝑬 − (𝑴 𝟐 + 𝑴 𝟑 ) 2. Find 𝐻 , 𝐺 � , 𝐺 � that minimize 𝑴 𝟐 + 𝑴 𝟑 (minimize classification error on source) 3. for 𝑙 = 1: 𝑜 Fix classifiers 𝐺 � , 𝐺 � , and find feature generator 𝐻 that minimizes 𝑬 Maximize D by learning classifier Minimize D by learning feature generator Source Source F 1 F 1 F 2 F 2 Target Target

Why Discrepancy Method Works Well? 20 Hypothesis Shared error of Expected error the ideal hypothesis in source domain Expected error in target domain

Object Classification  Synthetic images to Real images (12 Classes)  Finetune pre-trained ResNet101 [He et al., CVPR 2016] (ImageNet)  Source:images, Target:images Source (Synthetic images) Target (Real images)

Semantic Segmentation  Simulated Image (GTA5) to Real Image (CityScape)  Finetuning of pre-trained VGG, Dilated Residual Network [Yu et al., 2017] (ImageNet)  Calculate discrepancy pixel-wise  Evaluation by mean IoU (TP/(TP+FP+FN)) GTA 5 (Source) CityScape(Target) 100 source only 90 80 ours 70 60 IoU 50 40 30 20 10 0 road sdwk bldng wall pole light sign vg n trrn rider truck train mcycl bcycl fence sky car bus perso

Qualitative Results RGB Ground truth Source only Adapted (ours)

Take Home Messages  Between-class learning (BC learning)  Mix two training examples with a random ratio  Train the model to output the mixing ratio  Simple and easy to implement  Can be introduced independently from previous techniques:  network architectures, data augmentation schemes, optimizers, etc.  Unsupervised Domain Adaptation  Unsupervised domain adaptation method using classifier discrepancy is useful.

Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya - PowerPoint PPT Presentation

GTC March 29, 2018 Learning from Limited Data The Univ. of Tokyo / RIKEN AIP Tatsuya Harada Contents Background Deep Learning (DL) is one of the most successful machine learning methods. DL generally requires a huge amount of

Agritech Agritech Agritech Limited Agritech Agritech

DataCamp Data Types for Data Science DataCamp Data Types for Data Science Data types Data type

Simbhaoli Sugars Limited Simbhaoli Sugars Limited Simbhaoli Sugars Limited Simbhaoli Sugars

Lycopodium Limited Annual General Meeting 2008 Lycopodium Limited 2008 Annual General Meeting

Presented by Dining Butler Limited [ For qualified investors only ] DINING BUTLER LIMITED

The Learning Tree Workshop: The Learning Tree Workshop: Experience-based Learning Series on

STRUCTURE INTO MACHINE LEARNING TRINITY OF AI ALGORITHMS COMPUTE DATA 2 DEEP LEARNING IS

Learning From Data Lecture 2 The Perceptron The Learning Setup A Simple Learning Algorithm: PLA

Machine Learning 11 AI Slides (6e) c Lin Zuoquan@PKU 1998-2020 11 1 11 Machine Learning

What is mobile learning, mobile learning policies and technologies Dr. Mohamed Ally Learning

SOBHA LIMITED CORPORATE PROFILE THE COMPANY Founded by Mr. P.N.C. Menon, SOBHA Limited was

Annual General Meeting Australian Vintage Limited Australian Vintage Limited Annual General

AUSTAL LIMITED AUSTAL LIMITED AUSTAL LIMITED AUSTAL SHIPS IMAGE MARINE AUSTAL SHIPS IMAGE

Presented by Dining Butler Limited DINING BUTLER LIMITED 2 2 THINK OF FOOD THINK OF US We are

AUSTAL LIMITED AUSTAL LIMITED AUSTAL LIMITED AUSTAL SHIPS IMAGE MARINE AUSTAL SHIPS IMAGE

Minda Corporation Limited Preferred Employer Profitable 1 Minda Corporation Limited Security

Investor Presentation Fourth Quarter 2016 1 Safe Harbor Statement Some of the statements

IEC 61238 connectors testing system Based on High Current Generator System technology from

Welcome and Introduction 5 Presenter G. Gag age e Kingsbur sbury Vice President for the

CAT considerations for minimally manipulated ATMPs and the use of RBA for such products EMA

IS OWNING A CAT BETTER Elie J. Diner, PhD SlideTalk THAN OWNING A DOG? University of Cat

2 nd Quarter Investor Presentation Current as of April 30, 2015 Forward Looking Statements

Unpacking the Black-Box of Causality: Learning about Causal Mechanisms from Experimental and

Moodys Analytics Risk Practitioner Conference 2014 Quantitative Modeling Approaches for Mid