CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020

Overview • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Self-supervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

Part I: Weakly Supervised Learning

Model Complexity Keeps Increasing output 10 fc 120 LeNet (Lecun et al. 1998) conv conv fc 84 >100 millions of parameters ResNet (He et al. 2016)

[Sun et al. 2017]

Challenge: Limited labeled data ImageNet , 1M images 1B images x 1000 ~thousand annotation hours ~million annotation hours [Deng et al. 2009]

TRAINING AT SCALE Weakly Supervised Fully Supervised Un-supervised Levels of A CUTE CAT COUPLE CAT, DOG, ??? Supervision FLOOR #CAT Crawled web images ImageNet Instagram/Flickr

TRAINING AT SCALE Non-Visual Incorrect #LOVE #CAT #DOG #HUSKY Labels Labels Noisy Data Missing Labels

Flickr 100M [Joulin et al. 2015]

JFT 300M [Sun et al. 2017]

Can we use billions of images with hashtags for pre-training? [Mahajan et al. 2018]

Hashtags Selection 1.5K, 1B synonyms of ImageNet labels 17K, 3B synonyms of nouns in wordnet [Mahajan et al. 2018]

Network Architecture and Capacity ResNeXt-101 32x C d # of params x10^9 # of flops x10^6 160 900 120 675 80 450 40 225 0 0 4 8 16 32 48 4 8 16 32 48 C C Xie et al. 2016

Largest Weakly Supervised Training 3.5B   DISTRIBUTED LARGE CAPACITY MODEL PUBLIC INSTAGRAM 17K UNIQUE LABELS TRAINING (RESNEXT101-32X48) IMAGES (350 GPUS) [Mahajan et al. 2018] 85.1%

Results

Transfer Learning Performance Target task: ImageNet * With a bigger model, we even got 85.4% top-1 error on 16 ImageNet-1K.

Transfer Learning Performance Target task: ImageNet Target task: CUB-2011 & Places-365 * With a bigger model, we even got 85.4% top-1 error on 19 ImageNet-1K.

Models are surprisingly robust to label "noise" Dataset: IG-1B-17k Network: ResNext-101 32x16 20

Effect of Model Capacity Matching hashtags to target task helps (1.5K tags)   Target task: ImageNet-1K

BiT Transfer [Kolesnikov et al. 2020]

Part II: Data Augmentation

Data Augmentation “Quokka” Figure credit: https://github.com/aleju/imgaug

Data Augmentation “cat” Load image and label CNN Data

Data Augmentation Transformation function (TF) Load image and label CNN Data

Data Augmentation Transformation function - Change the pixels without changing the labels (TF) - Train on transformed data improves generalization - VERY widely used

Example of Transformation Functions (TFs) Original image Color jitter Horizontal flip Random crop

Heuristic Data Augmentation Human expert TF sequences Augmented data Data TF 1 TF L rotation flip

Heuristic Data Augmentation How to automatically learn the compositions and Human expert parameterizations of TFs? TF sequences Augmented data Data TF 1 TF L rotation flip

TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Augmented TF 1 TF L Data data rotation flip [Ratner et al. 2017]

TANDA T ransformation A dversarial N etworks for D ata A ugmentations Generator (LSTM) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Ratner et al. 2017]

TANDA T ransformation A dversarial N etworks for D ata A ugmentations Heuristic augmentation TANDA 100 +2.1% 75 +1.4 +3.4% 50 25 Generated MNIST samples 0 CIFAR-10 ACE (F1 score) Medical Imaging [Ratner et al. 2017]

AutoAugment [Cubuk et al. 2018]

AutoAugment Controller (RNN) TF sequences Discriminator Augmented TF 1 TF L Data data real or augmented? rotation flip [Cubuk et al. 2018]

AutoAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip State-of-the-art performance on various benchmarks, however the computational cost is very high. [Cubuk et al. 2018]

RandAugment Controller (RNN) TF sequences End model Augmented TF 1 TF L Data data Validation accuracy R rotation flip [Cubuk et al. 2019]

RandAugment (1) random sampling over the transformation functions Outperform AutoAugment (2) grid search over the parameters of each transformation Augmented TF 1 TF L Data data Randomly Randomly Sampled Sampled [Cubuk et al. 2019]

Adversarial AutoAugment Adversarial Controller (RNN) Reward signal Maximize Training loss TF sequences End model Augmented TF 1 TF L Data data Minimize Training loss rotation flip 12x reduction in computing cost on ImageNet, compared to AutoAugment. Top-1 error 1.36% on CIFAR-10 (new sota). [Zhang et al. 2019]

Uncertainty-based sampling augmentation Model selects the TFs that provides Rotate the most information during training —No policy learning required Invert mixup invert Cutout Augmented Data … K randomly sampled comp. of TFs … data rotate cutout Mixup Users provide transformation functions (TFs) [Wu et al. 2020]

Empirical results: State of the art quality Improved the existing methods across domains SoTA on CIFAR-10, CIFAR-100, and SVHN 84.54% on CIFAR-100 using Wide-ResNet-28-10 outperforming RandAugment (Cubuk et al.’19) by 1.24% Improved 0.28 pts. in accuracy on text classification problem CIFAR-10 CIFAR-100 SVHN

Check out the blog post series! Automating the Art of Data Augmentation (Part I: Overview) Automating the Art of Data Augmentation (Part II: Practical Methods) Automating the Art of Data Augmentation (Part III: Theory) Automating the Art of Data Augmentation (Part IV: New Direction)

Part III: Self-supervised Learning

Source: Yann LeCun’s talk

What if we can get labels for free for unlabelled data and train unsupervised dataset in a supervised manner?

Pretext Tasks

Rotation [Gidaris et al. 2018]

Rotation Gidaris et al. 2018

Patches [Doersch et al., 2015]

Colorization [Zhang et al. 2016] http://richzhang.github.io/colorization/

Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019]

Pretext Invariant Representation Learning (PIRL) [Misra et al. 2019] Positive pair Negative pairs

SimCLR [Chen et al. 2020]

Data Augmentation is the key [Chen et al. 2020]

Unsupervised learning benefits more from bigger models [Chen et al. 2020]

Summary • Weakly Supervised Learning • Flickr100M • JFT300M (Google) • Instagram3B (Facebook) • Data augmentation • Human heuristics • Automated data augmentation • Unsupervised Learning • Pretext tasks (rotation, patches, colorization etc.) • Invariant vs. Covariant learning • Contrastive learning based framework (current SoTA)

Questions ?

CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020 Overview Weakly Supervised Learning Flickr100M JFT300M (Google) Instagram3B (Facebook)

CS839 Special Topics in Deep Learning Course Overview Sharon Yixuan Li University of

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Special and Extra Special Groups Generalised Bestvina-Brady groups Special Cube Complexes My

Q2 2016 HIGHLIGHTS Thursday, July 28, 2016 FORWARD-LOOKING STATEMENTS This presentation contains

Prospects for Rare B Decay Studies at LHCb B physics @ LHC Pythia production cross section bb

Cut-o ff theorems for deadlocks and serializability Lisbeth Fajstrup Department of Mathematics

TermsforClassical Sequents Proof Invariants & Strong Normalisation Greg Restall melbourne

A fast algorithm for neutrally- buoyant Lagrangian particles in numerical ocean modeling Renske

Superhero Transformations How algorithmic thinking can help create visual stories Introduction to

Do Now: Look at the tape on the communicator on your desk and choose which one would be best for

Simulation Calibration of Cluster WL Mass Measurements with DC2 Joe Hollowed, Lindsey Bleem,

CS839 Special Topics in AI: Deep Learning Learning with Less - PowerPoint PPT Presentation

CS839 Special Topics in AI: Deep Learning Learning with Less Supervision Sharon Yixuan Li University of Wisconsin-Madison October 29, 2020 Overview Weakly Supervised Learning Flickr100M JFT300M (Google) Instagram3B (Facebook)

CS839 Special Topics in Deep Learning Course Overview Sharon Yixuan Li University of

Hao Su July 6, 2017 Outline Overview of 3D deep learning 3D deep learning algorithms

All You Want To Know About CNNs Yukun Zhu Deep Learning Deep Learning Image from

Deep Neural Networks and Deep Reinforcement Learning Deep Learning, Goodfellow, Bengio and

AGN deep multiwavelength AGN deep multiwavelength AGN deep multiwavelength surveys: surveys:

Deep Learning: Theory and Practice Deep Learning - Practical 02-04-2020 Considerations

Presentation about Deep Learning --- Zhongwu xie Contents 1.Brief introduction of Deep learning.

Deep Learning on GPUs March 2016 What is Deep Learning? GPUs and DL AGENDA DL in practice

Deep learning Deep reinforcement learning Hamid Beigy Sharif university of technology December

Differen'able Func'onal Programming Noel Welsh @noelwelsh underscore Goals Deep learning

DSC 102 Systems for Scalable Analytics Arun Kumar Topic 6: Deep Learning Systems 1 Outline

ACCELERATE DEEP LEARNING WITH NVIDIA'S DEEP LEARNING PLATFORM | STEPHEN JONES | GTC16 DEEP

Deep learning for natural language processing A short primer on deep learning Benoit Favre &lt;

Relational Deep Learning: A Deep Latent Variable Model for Link Prediction Hao Wang, Xingjian

Medical Imaging Elisa Sayrol Medical Imaging Interest in this area in Deep Learning: DeepDeep

Special and Extra Special Groups Generalised Bestvina-Brady groups Special Cube Complexes My

Q2 2016 HIGHLIGHTS Thursday, July 28, 2016 FORWARD-LOOKING STATEMENTS This presentation contains

Prospects for Rare B Decay Studies at LHCb B physics @ LHC Pythia production cross section bb

Cut-o ff theorems for deadlocks and serializability Lisbeth Fajstrup Department of Mathematics

TermsforClassical Sequents Proof Invariants &amp; Strong Normalisation Greg Restall melbourne

A fast algorithm for neutrally- buoyant Lagrangian particles in numerical ocean modeling Renske

Superhero Transformations How algorithmic thinking can help create visual stories Introduction to

Do Now: Look at the tape on the communicator on your desk and choose which one would be best for

Simulation Calibration of Cluster WL Mass Measurements with DC2 Joe Hollowed, Lindsey Bleem,

Deep learning for natural language processing A short primer on deep learning Benoit Favre <

TermsforClassical Sequents Proof Invariants & Strong Normalisation Greg Restall melbourne