[PPT] - CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations PowerPoint Presentation

SLIDE 1

CS 4803 / 7643: Deep Learning

Zsolt Kira Georgia Tech

Topics:

– Low-label ML Formulations

SLIDE 2

Administrativia

Projects!
Project Check-in due April 11th

– Will be graded pass/fail, if fail then you can address the issues – Counts for 5 points of project score

Poster due date moved to April 23rd (last day of class)

– No presentations

Final submission due date April 30th

(C) Dhruv Batra & Zsolt Kira 2

SLIDE 3

Types of Learning

Important note:

– Your project should include doing something beyond just downloading open-source code and tuning hyper- parameters. – This can include:

implementation of additional approaches (if leveraging open-source

code),

theoretical analysis, or
a thorough investigation of some phenomena.
When using external resources, provide references to

anything you used in the write-up!

(C) Dhruv Batra & Zsolt Kira 3

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

SLIDE 4

But wait, there’s more!

Transfer Learning
Domain adaptation
Semi-supervised learning
Zero-shot learning
One/Few-shot learning
Meta-Learning
Continual / Lifelong-learning
Multi-modal learning
Multi-task learning
Active learning
…

(C) Dhruv Batra & Zsolt Kira 4

SLIDE 5

Transfer Learning

(C) Dhruv Batra & Zsolt Kira 5

A Survey on Transfer Learning Sinno Jialin Pan and Qiang Yang Fellow, IEEE

SLIDE 6

Taskonomy

Builds graph of transferability between computer vision tasks:

1. Collect dataset of 4 million input images and labels for 26

vision tasks

a. Surface normal, Depth estimation, Segmentation, 2D Keypoints,

3D pose estimation

2. Train convolutional autoencoder architecture for each

tasks

http://taskonomy.stanford.edu/

Slide Credit: Camilo & Higuera Disentangling Task Transfer Learning, Amir R. Zamir, Alexander Sax*, William B. Shen*, Leonidas Guibas, Jitendra Malik, Silvio Savarese

SLIDE 7

Taskonomy

Builds graph of transferability between computer vision tasks:

3. Transferability obtained by Analytic

Hierarchy Process (from pairwise comparisons between all possible sources for each target task)

4. Final graph obtained by subgraph

selection optimization (best performance from a limited set of source tasks): transfer policy Empirical study on performance and data-efficiency gains from transfer using different datasets (Places and Imagenet)

Slide Credit: Camilo & Higuera

SLIDE 8

Taskonomy

Slide Credit: Camilo & Higuera

SLIDE 9

But wait, there’s more!

Transfer Learning
Domain adaptation
Semi-supervised learning
Zero-shot learning
One/Few-shot learning
Meta-Learning
Continual / Lifelong-learning
Multi-modal learning
Multi-task learning
Active learning
…

(C) Dhruv Batra & Zsolt Kira 9

SLIDE 10

Reducing Label Requirements

Alternative solution to gathering more data: exploit
ther sources of data that are imperfect but plentiful

– unlabeled data (unsupervised learning) – Multi-modal data (multimodal learning) – Multi-domain data (transfer learning, domain adaptation)

(C) Dhruv Batra & Zsolt Kira 10

SLIDE 11

Few-Shot Learning

(C) Dhruv Batra & Zsolt Kira 11

Slide Credit: Hugo Larochelle

SLIDE 12

Few-Shot Learning

(C) Dhruv Batra & Zsolt Kira 12

Slide Credit: Hugo Larochelle

SLIDE 13

Few-Shot Learning

(C) Dhruv Batra & Zsolt Kira 13

Let’s attack directly the problem of few-shot learning

– we want to design a learning algorithm A that outputs good parameters 𝜾 of a model M, when fed a small dataset Dtrain={(xi,yi)} i=1

Idea: let’s learn that algorithm A, end-to-end

– this is known as meta-learning or learning to learn

Rather than features, in few-shot learning, we aim at

transferring the complete training of the model on new datasets (not just transferring the features or initialization)

– ideally there should be no human involved in producing a model for new datasets

Slide Credit: Hugo Larochelle

SLIDE 14

Prior Methods

(C) Dhruv Batra & Zsolt Kira 14

One-shot learning has been studied before

– One-Shot learning of object categories (2006) Fei-Fei Li, Rob

Fergus and Pietro Perona

– Knowledge transfer in learning to recognize visual objects classes (2004) Fei-Fei Li – Object classification from a single example utilizing class relevance pseudo-metrics (2004) Michael Fink – Cross-generalization: learning novel classes from a single example by feature replacement (2005) Evgeniy Bart and Shimon

Ullman

These largely relied on hand-engineered features

– with recent progress in end-to-end deep learning, we hope to learn a representation better suited for few-shot learning

Slide Credit: Hugo Larochelle

SLIDE 15

Prior Meta-Learning Methods

(C) Dhruv Batra & Zsolt Kira 15

Early work on learning an update rule

– Learning a synaptic learning rule (1990) Yoshua Bengio, Samy

Bengio, and Jocelyn Cloutier

– The Evolution of Learning: An Experiment in Genetic Connectionism (1990) David Chalmers – On the search for new learning rules for ANNs (1995) Samy

Bengio, Yoshua Bengio, and Jocelyn Cloutier

Early work on recurrent networks modifying their

weights

– Learning to control fast-weight memories: An alternative to dynamic recurrent networks (1992) Jürgen Schmidhuber – A neural network that embeds its own meta-levels (1993)

Jürgen Schmidhuber

Slide Credit: Hugo Larochelle

SLIDE 16

Related Work: Meta-Learning

(C) Dhruv Batra & Zsolt Kira 16

Training a recurrent neural network to optimize

– outputs update, so can decide to do something else than gradient descent

Learning to learn by gradient descent by gradient descent

(2016) Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W.

Hoffman, David Pfau, Tom Schaul, and Nando de Freitas

Learning to learn using gradient descent (2001) Sepp Hochreiter, A.

Steven Younger, and Peter R. Conwell

Slide Credit: Hugo Larochelle

SLIDE 17

Related Work: Meta-Learning

(C) Dhruv Batra & Zsolt Kira 17

Hyper-parameter optimization

– idea of learning the learning rates and the initialization conditions

Gradient-based hyperparameter optimization through

reversible learning (2015) Dougal Maclourin, David Duvenaud,

and Ryan P. Adams

Slide Credit: Hugo Larochelle

SLIDE 18

Related Work: Meta-Learning

(C) Dhruv Batra & Zsolt Kira 18

AutoML (Bayesian optimization, reinforcement

learning)

Neural Architecture Search with Reinforcement

Learning (2017) Barret Zoph and Quoc Le

Slide Credit: Hugo Larochelle

SLIDE 19

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 19

Learning algorithm A

– input: training set –

utput: parameters 𝜾 model M (the learner)

–

bjective: good performance on test set
Meta-learning algorithm

– input: meta-training set

f episodes

–

utput: parameters 𝝞 algorithm A (the meta-learner)

–

bjective: good performance on meta-test set

Slide Credit: Hugo Larochelle

SLIDE 20

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 20

Slide Credit: Hugo Larochelle

SLIDE 21

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 21

Slide Credit: Hugo Larochelle

SLIDE 22

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 22

Slide Credit: Hugo Larochelle

SLIDE 23

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 23

Slide Credit: Hugo Larochelle

SLIDE 24

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 24

Slide Credit: Hugo Larochelle

SLIDE 25

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 25

Slide Credit: Hugo Larochelle

SLIDE 26

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 26

Slide Credit: Hugo Larochelle

SLIDE 27

Meta-Learning Nomenclature

(C) Dhruv Batra & Zsolt Kira 27

Slide Credit: Hugo Larochelle

SLIDE 28

Meta-Learning Nomenclature

Assuming a probabilistic model M over labels, the

cost per episode can become

Depending on the choice of meta-learner,

will take a different form

(C) Dhruv Batra & Zsolt Kira 28

Slide Credit: Hugo Larochelle

SLIDE 29

Meta-Learner

How to parametrize learning algorithms?
Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

kNN/kernel machine: Matching networks (Vinyals et al. 2016)
Gaussian classifier: Prototypical Networks (Snell et al. 2017)
Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

MANN (Santoro et al. 2016)
SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 29

Slide Credit: Hugo Larochelle

SLIDE 30

Meta-Learner

How to parametrize learning algorithms?
Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

kNN/kernel machine: Matching networks (Vinyals et al. 2016)
Gaussian classifier: Prototypical Networks (Snell et al. 2017)
Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

MANN (Santoro et al. 2016)
SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 30

Slide Credit: Hugo Larochelle

SLIDE 31

Matching Networks

(C) Dhruv Batra & Zsolt Kira 31

Slide Credit: Hugo Larochelle

SLIDE 32

Prototypical Networks

(C) Dhruv Batra & Zsolt Kira 32

Slide Credit: Hugo Larochelle

SLIDE 33

Prototypical Networks

(C) Dhruv Batra & Zsolt Kira 33

Slide Credit: Hugo Larochelle

SLIDE 34

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 34

Slide Credit: Hugo Larochelle

SLIDE 35

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 35

Slide Credit: Hugo Larochelle

SLIDE 36

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 36

Slide Credit: Hugo Larochelle

SLIDE 37

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 37

Slide Credit: Hugo Larochelle

SLIDE 38

Meta-Learning Algorithm

(C) Dhruv Batra & Zsolt Kira 38

SLIDE 39

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 39

Slide Credit: Hugo Larochelle

SLIDE 40

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 40

Slide Credit: Hugo Larochelle

SLIDE 41

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 41

Slide Credit: Sergey Levine

SLIDE 42

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 42

Slide Credit: Sergey Levine

SLIDE 43

Comparison

(C) Dhruv Batra & Zsolt Kira 43

Slide Credit: Sergey Levine

SLIDE 44

Meta-Learner

How to parametrize learning algorithms?
Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

kNN/kernel machine: Matching networks (Vinyals et al. 2016)
Gaussian classifier: Prototypical Networks (Snell et al. 2017)
Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

MANN (Santoro et al. 2016)
SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 44

Slide Credit: Hugo Larochelle

SLIDE 45

Black-Box Meta-Learner

(C) Dhruv Batra & Zsolt Kira 45

Slide Credit: Hugo Larochelle

SLIDE 46

Memory-Augmented Neural Network

(C) Dhruv Batra & Zsolt Kira 46

Slide Credit: Hugo Larochelle

SLIDE 47

Experiments

(C) Dhruv Batra & Zsolt Kira 47

Slide Credit: Hugo Larochelle

SLIDE 48

Experiments

(C) Dhruv Batra & Zsolt Kira 48

Slide Credit: Hugo Larochelle

SLIDE 49

Extensions and Variations

(C) Dhruv Batra & Zsolt Kira 49

Slide Credit: Hugo Larochelle

SLIDE 50

But beware

(C) Dhruv Batra & Zsolt Kira 50

Slide Credit: Hugo Larochelle A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

SLIDE 51

(C) Dhruv Batra & Zsolt Kira 51

SLIDE 52

Distribution Shift

What if there is a

distribution shift (cross- domain)?

Lesson: Methods that are

successful within-domain might be worse across domains!

(C) Dhruv Batra & Zsolt Kira 52

SLIDE 53

Distribution Shift

(C) Dhruv Batra & Zsolt Kira 53

SLIDE 54

Random Task Proposals

(C) Dhruv Batra & Zsolt Kira 54

SLIDE 55

Does it Work?

(C) Dhruv Batra & Zsolt Kira 55

SLIDE 56

Discussions

(C) Dhruv Batra & Zsolt Kira 56

What is the right definition of distributions over

problems?

– varying number of classes / examples per class (meta- training vs. meta-testing) ? – semantic differences between meta-training vs. meta-testing classes ? – overlap in meta-training vs. meta-testing classes (see recent “low-shot” literature) ?

Move from static to interactive learning

– how should this impact how we generate episodes ? – meta-active learning ? (few successes so far)

Slide Credit: Hugo Larochelle