Bayesian Generative Active Deep Learning Toan Tran 1 Thanh-Toan Do 2 - PowerPoint PPT Presentation

36th International Conference on Machine Learning (ICML), 2019 Bayesian Generative Active Deep Learning Toan Tran 1 Thanh-Toan Do 2 Ian Reid 1 Gustavo Carneiro 1 1 University of Adelaide, Australia 2 University of Liverpool Wed Jun 12th 04:30 – 04:35 PM @ Room 201 Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 1 / 13

Introduction and Motivation Deep learning (DL) The dominant machine learning methodology [Huang et al., 2017, Rajkomar et al., 2018]. Issue: significant human effort for the labeling; considerable computational resources for large-scale training process [Sun et al., 2017]. How to address these training issues? Two popular approaches: (Pool-based) active learning (AL): Challenging to be applied in DL: AL may overfit 1 the (small) informative training sets Data augmentation (DA): the generation of new samples is done without regarding 2 their informativeness = ⇒ the training process takes longer than necessary and relatively ineffective Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 2 / 13

Introduction and Motivation Main goals of this paper Propose a novel Bayesian generative active deep learning method Targets the augmentation of the labeled data set with informative generated samples Key technical contribution : Figure 1: Our proposed method theoretically and empirically show the informativeness of this generated depicted by VAE-ACGAN model sample. Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 3 / 13

Introduction and Motivation Generative adversarial active learning (GAAL) [Zhu and Bento, 2017] Relies on an optimization problem to generate new informative samples Can generate rich representative training data with the assumptions: GAN model has been pre-trained, and The optimization during Figure 2: Generative adversarial active generation is solved learning (GAAL) [Zhu and Bento, 2017] efficiently Comparison between our proposed method and GAAL [Zhu and Bento, 2017] GAAL [Zhu and Bento, 2017] Ours acquisition function simple (binary classifier) more effective (deep models) training of the generator ( G ) 2-stage G and C are jointly trained and classifier ( C ) GAN model is pre-trained Allowing them to “co-evolve” classification results not competitive enough Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 4 / 13

Methodology Bayesian Generative Active Deep Learning Main technical contribution: Combining BALD and BDA for generating new labeled samples that are informative for the training process. i =1 , where x i ∈ X ⊆ R d is the data sample Initial labeled data: D = { ( x i , y i ) } N labeled with y i ∈ C = { 1 , 2 , . . . , C } ( C = # classes). Bayesian active learning by disagreement (BALD) scheme [Gal et al., 2017, Houlsby et al., 2011]: The most informative sample x ∗ is selected from the (unlabeled) pool data D pool by [Houlsby et al., 2011]: x ∗ = arg max a ( x , M ) , (1) x ∈D pool where the acquisition function a ( x , M ) is estimated by the Monte Carlo (MC) dropout method [Gal et al., 2017] � � � � 1 1 + 1 � � p t � p t � p t p t a ( x , M ) ≈ − ˆ log ˆ ˆ c log ˆ c , (2) c c T T T c t t c,t where T is the number of dropout iterations, p t = [ˆ p t p t C ] = softmax( f ( x ; θ t )) , with f is the network function ˆ 1 , . . . , ˆ parameterized by θ t ∼ p ( θ |D ) at the t -th iteration. Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 5 / 13

Methodology Bayesian Generative Active Deep Learning The generated sample x ′ : 0.45 x ′ = g ( e ( x ∗ )) , (3) 0.4 where a variational autoencoder 0.35 (VAE) [Kingma and Welling, 2013] contains an encoder e ( . ) and a 0.3 decoder g ( . ) 0.25 VAE training: minimizing the “reconstruction loss”, where if # 0.2 training iterations is sufficiently 0.15 large, we have: � x ′ − x ∗ � < ε, 0.1 (4) 0 50 100 150 Number of Iterations ε > 0 (arbitrarily small) – see Fig. 3. Figure 3: Reduction of � x ′ − x ∗ � as ′ , y ∗ ) } , which D ← D ∪ { ( x ∗ , y ∗ ) , ( x the training progresses of the VAE are used for the next training model. iteration. Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 6 / 13

Methodology Bayesian Generative Active Deep Learning Key Question How about the “information content” of the generated sample x ′ measured by a ( x ′ , M ) ? Proposition 2.1 Assuming that there exists the gradient of the acquisition function a ( x , M ) with respect to the variable x , namely ∇ x a ( x , M ) , and that x ∗ is an interior point of D pool , then a ( x ′ , M ) ≈ a ( x ∗ , M ) (i.e., the absolute difference between these values are within a certain range). Consequently, the sample x ′ generated from the most informative sample x ∗ by (3) is also informative. Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 7 / 13

Implementation Figure 4: Network architecture of our proposed model. Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 8 / 13

Experiments and Results Classification performance measured by the top-1 accuracy as a function of the number of acquisition iterations and the percentage of training samples. Our proposed algorithm, active learning using “information-preserving” data augmentation (AL w. VAEACGAN) is compared with: Active learning using BDA (AL w. ACGAN) BALD [Gal et al., 2017] without using data augmentation (AL without DA), BDA [Tran et al., 2017] without active learning (BDA) (using full and partial training sets) Random selection Benchmark data sets: MNIST [LeCun et al., 1998], CIFAR-10, CIFAR-100 [Krizhevsky et al., 2012], and SVHN [Netzer et al., 2011]. Baseline classifiers: ResNet18 [He et al., 2016a] and ResNet18pa [He et al., 2016b] Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 9 / 13

Experiments and Results Resnet18 on MNIST Resnet18 on CIFAR10 Resnet18 on CIFAR100 Resnet18 on SVHN 100 90 70 98 97 99.5 65 85 96 Test accuracy 99 Test accuracy Test accuracy 60 Test accuracy 95 80 98.5 55 94 75 93 BDA (full training) 98 50 AL w. VAEACGAN 92 AL w. ACGAN 70 BDA (partial training) 97.5 45 91 AL without DA Random selection 97 65 40 90 1 50 100 150 1 50 100 150 1 50 100 150 1 10 20 30 40 50 (1.67%) (9.89%) (18.28%) (26.67%) (10%) (19.86%) (29.93%) (40%) (30%) (39.86%) (49.93%) (60%) (20%) (26.43%) (33.57%) (40.71%) (47.86%) (55%) # acquisition iterations (% of training sample) # acquisition iterations (% of training sample) # acquisition iterations (% of training sample) # acquisition iterations (% of training sample) Resnet18pa on MNIST Resnet18pa on CIFAR10 Resnet18pa on CIFAR100 Resnet18pa on SVHN 100 95 75 98 97 99.5 70 90 96 Test accuracy 99 Test accuracy Test accuracy 65 Test accuracy 95 85 94 98.5 60 93 80 BDA (full training) 98 55 92 AL w. VAEACGAN AL w. ACGAN 91 75 97.5 50 BDA (partial training) 90 AL without DA Random selection 97 70 45 89 1 50 100 150 1 50 100 150 1 50 100 150 1 10 20 30 40 50 (1.67%) (9.89%) (18.28%) (26.67%) (10%) (19.86%) (29.93%) (40%) (30%) (39.86%) (49.93%) (60%) (20%) (26.43%) (33.57%) (40.71%) (47.86%) (55%) # acquisition iterations (% of training sample) # acquisition iterations (% of training sample) # acquisition iterations (% of training sample) # acquisition iterations (% of training sample) (a) MNIST (b) CIFAR-10 (c) CIFAR-100 (d) SVHN Figure 5: Training and classification performance of the proposed Bayesian generative active learning (AL w. VAEACGAN) compared to other methods. This performance is measured as a function of the number of acquisition iterations and respective percentage of samples from the original training set used for modeling. Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 10 / 13

Experiments and Results Table I: Mean ± standard deviation of the classification accuracy on MNIST, CIFAR-10, and CIFAR-100 after 150 iterations over 3 runs MNIST AL w. VAEACGAN AL w. ACGAN AL w. PMDA AL without DA BDA (partial training) Random selection Resnet18 99 . 53 ± 0 . 05 99 . 45 ± 0 . 02 99 . 37 ± 0 . 15 99 . 33 ± 0 . 10 99 . 33 ± 0 . 04 99 . 00 ± 0 . 13 Resnet18pa 99 . 68 ± 0 . 08 99 . 57 ± 0 . 07 99 . 49 ± 0 . 09 99 . 35 ± 0 . 11 99 . 35 ± 0 . 07 99 . 20 ± 0 . 12 CIFAR-10 Resnet18 87 . 63 ± 0 . 11 86 . 80 ± 0 . 45 82 . 17 ± 0 . 35 79 . 72 ± 0 . 19 85 . 08 ± 0 . 31 77 . 29 ± 0 . 23 Resnet18pa 91 . 13 ± 0 . 10 90 . 70 ± 0 . 24 87 . 70 ± 0 . 39 85 . 51 ± 0 . 21 86 . 90 ± 0 . 27 80 . 69 ± 0 . 19 CIFAR-100 Resnet18 68 . 05 ± 0 . 17 66 . 50 ± 0 . 63 55 . 24 ± 0 . 57 50 . 57 ± 0 . 20 65 . 76 ± 0 . 40 49 . 67 ± 0 . 52 Resnet18pa 69 . 69 ± 0 . 13 67 . 79 ± 0 . 76 59 . 67 ± 0 . 60 55 . 82 ± 0 . 31 65 . 79 ± 0 . 51 54 . 77 ± 0 . 29 (a) MNIST (b) CIFAR-10 (c) CIFAR-100 (d) SVHN Figure 6: Images generated by our proposed (AL w. VAEACGAN) approach for each data set. Toan Tran (University of Adelaide) Long Beach, CA, USA Jun 12, 2019 11 / 13

Bayesian Generative Active Deep Learning Toan Tran 1 Thanh-Toan Do 2 - PowerPoint PPT Presentation

36th International Conference on Machine Learning (ICML), 2019 Bayesian Generative Active Deep Learning Toan Tran 1 Thanh-Toan Do 2 Ian Reid 1 Gustavo Carneiro 1 1 University of Adelaide, Australia 2 University of Liverpool Wed Jun 12th 04:30

generative design systems Generative Brief Design Definitions Workshop Processes

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Syllabus link 1 syllabus BDA17 Syllabus version C 2 Syllabus supplement about memos

732A54 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Pe na IDA,

SPARQL Graph Pattern Processing with Apache Spark title ?P speaker Hubert Naacke University

Universal Shape Formation for Programmable Matter (Thim Strothmann) Joint work with BDA 2016

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Constant delay enumeration for FO queries over databases with local bounded expansion Luc Segoufin

@GregBala Greg.b@BDAEntertainment.com About us Realm of Empires - MMORTS Started Played on

PBLCACHE PBLCACHE A client side persistent block cache for the data center Vault Boston 2015 -

Bayesian Generative Active Deep Learning Toan Tran 1 Thanh-Toan Do 2 - PowerPoint PPT Presentation

36th International Conference on Machine Learning (ICML), 2019 Bayesian Generative Active Deep Learning Toan Tran 1 Thanh-Toan Do 2 Ian Reid 1 Gustavo Carneiro 1 1 University of Adelaide, Australia 2 University of Liverpool Wed Jun 12th 04:30

generative design systems Generative Brief Design Definitions Workshop Processes

Learning Deep Generative Models Inference &amp; Representation Lecture 12 Rahul G. Krishnan

Deep-Learning: Unsupervised Generative models Deep Belief Networks Deep Stacked AutoEncoders

Function Space Priors in Bayesian Deep Learning Roger Grosse Motivation Today Bayesian deep

Agenda Intro to Active Learning Activity Design Resources for Active Learning Lunch with Active

Outline Intro to RL and Bayesian Learning History of Bayesian RL Model-based Bayesian

Generative networks part 2: GANs 23 / 54 Recap on generative networks Generative networks provide

Generative Adversarial Nets(GANs) Troy Cary and Chenzhi Zhao A generative adversarial net is

Bayesian Learning 1 Outline MLE, MAP vs. Bayesian Learning Bayesian Linear Regression

The Active Card An Active Mind in an Active Body More people, More Active, More often! The

Active Adversary Lecture 7 CCA Security MAC Active Adversary Active Adversary An active

Being Bayesian About Being Bayesian About Net work St ruct ure Net work St ruct ure A Bayesian

CS440/ECE448 Lecture 15: Bayesian Inference and Bayesian Learning Slides by Svetlana Lazebnik,

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Multi-Task Active Learning Yi Zhang Outline Active Learning Multi-Task Active Learning

Syllabus link 1 syllabus BDA17 Syllabus version C 2 Syllabus supplement about memos

732A54 Big Data Analytics Lecture 10: Machine Learning with MapReduce Jose M. Pe na IDA,

SPARQL Graph Pattern Processing with Apache Spark title ?P speaker Hubert Naacke University

Universal Shape Formation for Programmable Matter (Thim Strothmann) Joint work with BDA 2016

The HPC Skill Tree A Brief Overview Kai Himstedt On Behalf of the HPC-CF Board BoF:

Constant delay enumeration for FO queries over databases with local bounded expansion Luc Segoufin

@GregBala Greg.b@BDAEntertainment.com About us Realm of Empires - MMORTS Started Played on

PBLCACHE PBLCACHE A client side persistent block cache for the data center Vault Boston 2015 -

Learning Deep Generative Models Inference & Representation Lecture 12 Rahul G. Krishnan