CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: – Moving beyond supervised learning Zsolt Kira Georgia Tech

Administrativia • Projects! – Due April 30 th – Template online – Can use MS Word but follow the organization/rubric! • No posters/presentations (C) Zsolt Kira 2

Project Note • Important note: – Your project should include doing something beyond just downloading open-source code, fine-tuning, and showing the result – This can include: • implementation of additional approaches (even if leveraging open- source code), • a thorough analysis/investigation of some phenomena or hypothesis • theoretical analysis, or • When using external resources, provide references to anything you used in the write-up! (C) Zsolt Kira 3

Supervised Learning Supervised Learning ● ML has been focused largely on this ● Lots of other problem settings are now coming up: What if we have unlabeled data? ○ What if we have many datasets? ○ 4 What if we only have one example per (new) class? ○

But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation Setting Source Target Shift Type • Meta-Learning Semi-supervised Single Single None • Zero-shot learning labeled unlabeled • Continual / Lifelong-learning Domain Single Single Non- Adaptation labeled unlabeled semantic • Multi-modal learning Domain Multiple Unknown Non- • Multi-task learning Generalization labeled semantic Cross-Task Single Single Semantic • Active learning Transfer labeled unlabeled • … Few-Shot Single Single few- Semantic Learning labeled labeled Un/Self- Single Many labeled Both/Task Supervised unlabeled (C) Zsolt Kira 5

An Entire Class on this! • Deep Unsupervised Learning class (UC Berkeley) • Link: – https://sites.google.com/view/berkeley-cs294-158-sp20/home (C) Zsolt Kira 6

But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation • Meta-Learning • Zero-shot learning • Continual / Lifelong-learning • Multi-modal learning • Multi-task learning • Active learning • … (C) Zsolt Kira 7

What is Semi-Supervised Learning? Supervised Learning Semi-Supervised Learning 8 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

What is Semi-Supervised Learning? Supervised Learning Semi-Supervised Learning 9 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

Semi-Supervised Learning ● Classification: Fully Supervised ○ Training data: (image, label), predict label for new images. ● What if we have a few labeled samples and many unlabeled samples? Labeling is generally time-consuming and expensive in certain domains. ● Semi-Supervised Learning ○ Training data: Labeled data (image, label) and Unlabeled data (image) ○ Goal: Use the unlabeled data to make supervised learning better ○ Note: If we have lots of labeled data, this goal is much harder 10 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

Why Semi-Supervised Learning? Slide: Thang Luong ● My take: Reality might be in-between: 11 ● Might be able to improve upon high-labeled data regime but with exponentially increasing unlabeled data (of the proper type) ● See Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

Agenda ■ Core concepts Confidence vs Entropy ■ Pseudo Labeling ■ Entropy minimization ■ Virtual Adversarial Training ■ Label Consistency ■ Make sure augmentations of the sample have the same class ■ Pi-Model, Temporal Ensembling, Mean Teacher ■ Regularization ■ Weight decay ■ Dropout ■ Data-Augmentation (MixUp, CutOut) ■ Unsupervised Data Augmentation (UDA), MixMatch ■ Co-Training / Self-Training / Pseudo Labeling (Noisy Student) ■ 12 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

Pseudo Labeling ● Simple idea: ○ Train on labeled data ○ Make predictions on unlabeled data ○ Add confident predictions to training data ○ Can do these both end-to-end (no need to separate stages) 13 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

Issue: Confidences on New Data ● Predictions on unlabeled data may be too flat (high entropy) ● Solution: Entropy minimization ● Several ways to achieve this ○ Explicit loss ○ Sharpening function (e.g. temperature scaling) 14 Image Credit: Figure modified from MixMatch paper

Label Consistency with Data Augmentation 15

Label Consistency with Data Augmentation Could be Unlabeled or Labeled 16

Label Consistency with Data Augmentation 17

Label Consistency with Data Augmentation Make sure that the logits are similar 18

More Data Augmentation -> Regularization 19

Realistic Evaluation of Semi-Supervised Learning 20

Outline ■ Realistic Evaluation of Semi-Supervised Learning pi-model ■ Temporal Ensembling ■ Mean Teacher ■ Virtual Adversarial Training ■ 21

pi-Model Temporal Ensembling for Semi-Supervised Learning 22

pi-Model Temporal Ensembling for Semi-Supervised Learning 23

Comparison 24

Comparison 25

Varying number of labels 26

Class Distribution Mismatch 27

MixMatch 28

MixMatch 29

MixMatch MixUp 30

MixMatch 31

MixMatch 32

MixMatch 33

MixMatch 34

FixMatch 35

FixMatch - Results 36

But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation • Meta-Learning • Zero-shot learning • Continual / Lifelong-learning • Multi-modal learning • Multi-task learning • Active learning • … (C) Zsolt Kira 37

Few-Shot Learning (C) Zsolt Kira 38 Slide Credit: Hugo Larochelle

Few-Shot Learning (C) Zsolt Kira 39 Slide Credit: Hugo Larochelle

Normal Approach • Do what we always do: Fine-tuning – Train classifier on base classes – Freeze features – Learn classifier weights for new classes using few amounts of labeled data (during “ query” time!) A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, (C) Zsolt Kira 40 Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

Cons of Normal Approach • The training we do on the base classes does not factor the task into account • No notion that we will be performing a bunch of N- way tests • Idea: simulate what we will see during test time (C) Zsolt Kira 41

Meta-Training Approach • Set up a set of smaller tasks during training which simulates what we will be doing during testing https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/ – Can optionally pre-train features on held-out base classes (not typical) • Testing stage is now the same, but with new classes (C) Zsolt Kira 42

Meta-Learning Approaches • Learning a model conditioned on support set (C) Zsolt Kira 43

More Sophisticated Meta-Learning Approaches • Learn gradient descent: – Parameter initialization and update rules – Output: • Parameter initialization • Meta-learner that decides how to update parameters • Learn just an initialization and use normal gradient descent (MAML) – Output: • Just parameter initialization! • We are using SGD (C) Dhruv Batra & Zsolt Kira 44

Meta-Learner • How to parametrize learning algorithms? • Two approaches to defining a meta-learner – Take inspiration from a known learning algorithm • kNN/kernel machine: Matching networks (Vinyals et al. 2016) • Gaussian classifier: Prototypical Networks (Snell et al. 2017) • Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) , MAML (Finn et al. 2017) – Derive it from a black box neural network • MANN (Santoro et al. 2016) • SNAIL (Mishra et al. 2018) (C) Zsolt Kira 45 Slide Credit: Hugo Larochelle

More Sophisticated Meta-Learning Approaches • Learn gradient descent: – Parameter initialization and update rules – Output: • Parameter initialization • Meta-learner that decides how to update parameters • Learn just an initialization and use normal gradient descent (MAML) – Output: • Just parameter initialization! • We are using SGD (C) Zsolt Kira 46

Meta-Learner LSTM (C) Zsolt Kira 47 Slide Credit: Hugo Larochelle

Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 53 Slide Credit: Hugo Larochelle

Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 55 Slide Credit: Sergey Levine

Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 56 Slide Credit: Sergey Levine

Comparison (C) Zsolt Kira 57 Slide Credit: Sergey Levine

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia Tech Administrativia Projects! Due April 30 th Template online Can use MS Word but follow the organization/rubric! No

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Zsolt Kira Georgia

How does your brain work? And what if knowing how it works meant that you could make it work

Gabriel Kreiman Email : gabriel.kreiman@tch.harvard.edu Phone : 617-919-2530 Web site :

KOTLIN NATIVE CONCURRENCY EXPLAINED KEVIN GALLIGAN @kpgalligan Copenhagen Denmark Touchlab

Green Zone Brain, Green Zone World: Two Keys for the Human Tribe In the 21 st Century Madrid,

Br Brain ainstor ormin ing ECE398psc Innovation and Engineering Design

What Teens Are Up AgainstAnd What You Can Do About It Using Brain Science and Parent

Accelerated Learning - for Breakthrough Results Whole brain, person, systems approach Debbie

FreeSurfer: Troubleshooting surfer.nmr.mgh.harvard.edu 1 Hard and Soft Failures Categories of

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia Tech Administrativia Projects! Due April 30 th Template online Can use MS Word but follow the organization/rubric! No

CS 4803 / 7643: Deep Learning Website: http://www.cc.gatech.edu/classes/AY2020/cs7643_spring/

CS 4803 / 7643: Deep Learning Website: https://www.cc.gatech.edu/classes/AY2020/cs7643_fall/

CS 4803 / 7643: Deep Learning Topics: Image Classification Supervised Learning view

CS 4803 / 7643: Deep Learning Topics: Structured representations with graph networks Zsolt

CS 4803 / 7643: Deep Learning Topics: Dynamic Programming (Q-Value Iteration)

CS 4803 / 7643: Deep Learning Topic: Reinforcement Learning (RL) Overview Markov

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Ashwin Kalyan

CS 4803 / 7643: Deep Learning Guest Lecture: Embeddings and world2vec Feb. 18 th 2020 Ledell Wu

CS 4803 / 7643: Deep Learning Topics: Forward and backward though conv (Beginning) of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: (Continue) Low-label ML Formulations Zsolt Kira

CS 4803 / 7643: Deep Learning Topics: Application: PointGoal Navigation Trust Region

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech

CS 4803 / 7643: Deep Learning Topics: Backpropagation Vector/Matrix/Tensor math

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward &amp; Backward

CS 4803 / 7643: Deep Learning Topics: Policy Gradients Actor Critic Zsolt Kira Georgia

How does your brain work? And what if knowing how it works meant that you could make it work

Gabriel Kreiman Email : gabriel.kreiman@tch.harvard.edu Phone : 617-919-2530 Web site :

KOTLIN NATIVE CONCURRENCY EXPLAINED KEVIN GALLIGAN @kpgalligan Copenhagen Denmark Touchlab

Green Zone Brain, Green Zone World: Two Keys for the Human Tribe In the 21 st Century Madrid,

Br Brain ainstor ormin ing ECE398psc Innovation and Engineering Design

What Teens Are Up AgainstAnd What You Can Do About It Using Brain Science and Parent

Accelerated Learning - for Breakthrough Results Whole brain, person, systems approach Debbie

FreeSurfer: Troubleshooting surfer.nmr.mgh.harvard.edu 1 Hard and Soft Failures Categories of

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward

CS 4803 / 7643: Deep Learning Topics: Specifying Layers Forward & Backward