CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations - - PowerPoint PPT Presentation

cs 4803 7643 deep learning
SMART_READER_LITE
LIVE PREVIEW

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations - - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Low-label ML Formulations Zsolt Kira Georgia Tech Administrativia Projects! Project Check-in due April 11 th Will be graded pass/fail, if fail then you can address the issues Counts


slide-1
SLIDE 1

CS 4803 / 7643: Deep Learning

Zsolt Kira Georgia Tech

Topics:

– Low-label ML Formulations

slide-2
SLIDE 2

Administrativia

  • Projects!
  • Project Check-in due April 11th

– Will be graded pass/fail, if fail then you can address the issues – Counts for 5 points of project score

  • Poster due date moved to April 23rd (last day of class)

– No presentations

  • Final submission due date April 30th

(C) Dhruv Batra & Zsolt Kira 2

slide-3
SLIDE 3

Types of Learning

  • Important note:

– Your project should include doing something beyond just downloading open-source code and tuning hyper- parameters. – This can include:

  • implementation of additional approaches (if leveraging open-source

code),

  • theoretical analysis, or
  • a thorough investigation of some phenomena.
  • When using external resources, provide references to

anything you used in the write-up!

(C) Dhruv Batra & Zsolt Kira 3

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

slide-4
SLIDE 4

But wait, there’s more!

  • Transfer Learning
  • Domain adaptation
  • Semi-supervised learning
  • Zero-shot learning
  • One/Few-shot learning
  • Meta-Learning
  • Continual / Lifelong-learning
  • Multi-modal learning
  • Multi-task learning
  • Active learning

(C) Dhruv Batra & Zsolt Kira 4

slide-5
SLIDE 5

Transfer Learning

(C) Dhruv Batra & Zsolt Kira 5

A Survey on Transfer Learning Sinno Jialin Pan and Qiang Yang Fellow, IEEE

slide-6
SLIDE 6

Taskonomy

Builds graph of transferability between computer vision tasks:

  • 1. Collect dataset of 4 million input images and labels for 26

vision tasks

  • a. Surface normal, Depth estimation, Segmentation, 2D Keypoints,

3D pose estimation

  • 2. Train convolutional autoencoder architecture for each

tasks

http://taskonomy.stanford.edu/

Slide Credit: Camilo & Higuera Disentangling Task Transfer Learning, Amir R. Zamir, Alexander Sax*, William B. Shen*, Leonidas Guibas, Jitendra Malik, Silvio Savarese

slide-7
SLIDE 7

Taskonomy

Builds graph of transferability between computer vision tasks:

  • 3. Transferability obtained by Analytic

Hierarchy Process (from pairwise comparisons between all possible sources for each target task)

  • 4. Final graph obtained by subgraph

selection optimization (best performance from a limited set of source tasks): transfer policy Empirical study on performance and data-efficiency gains from transfer using different datasets (Places and Imagenet)

Slide Credit: Camilo & Higuera

slide-8
SLIDE 8

Taskonomy

Slide Credit: Camilo & Higuera

slide-9
SLIDE 9

But wait, there’s more!

  • Transfer Learning
  • Domain adaptation
  • Semi-supervised learning
  • Zero-shot learning
  • One/Few-shot learning
  • Meta-Learning
  • Continual / Lifelong-learning
  • Multi-modal learning
  • Multi-task learning
  • Active learning

(C) Dhruv Batra & Zsolt Kira 9

slide-10
SLIDE 10

Reducing Label Requirements

  • Alternative solution to gathering more data: exploit
  • ther sources of data that are imperfect but plentiful

– unlabeled data (unsupervised learning) – Multi-modal data (multimodal learning) – Multi-domain data (transfer learning, domain adaptation)

(C) Dhruv Batra & Zsolt Kira 10

slide-11
SLIDE 11

Few-Shot Learning

(C) Dhruv Batra & Zsolt Kira 11

Slide Credit: Hugo Larochelle

slide-12
SLIDE 12

Few-Shot Learning

(C) Dhruv Batra & Zsolt Kira 12

Slide Credit: Hugo Larochelle

slide-13
SLIDE 13

Few-Shot Learning

(C) Dhruv Batra & Zsolt Kira 13

  • Let’s attack directly the problem of few-shot learning

– we want to design a learning algorithm A that outputs good parameters 𝜾 of a model M, when fed a small dataset Dtrain={(xi,yi)} i=1

  • Idea: let’s learn that algorithm A, end-to-end

– this is known as meta-learning or learning to learn

  • Rather than features, in few-shot learning, we aim at

transferring the complete training of the model on new datasets (not just transferring the features or initialization)

– ideally there should be no human involved in producing a model for new datasets

Slide Credit: Hugo Larochelle

slide-14
SLIDE 14

Prior Methods

(C) Dhruv Batra & Zsolt Kira 14

  • One-shot learning has been studied before

– One-Shot learning of object categories (2006) Fei-Fei Li, Rob

Fergus and Pietro Perona

– Knowledge transfer in learning to recognize visual objects classes (2004) Fei-Fei Li – Object classification from a single example utilizing class relevance pseudo-metrics (2004) Michael Fink – Cross-generalization: learning novel classes from a single example by feature replacement (2005) Evgeniy Bart and Shimon

Ullman

  • These largely relied on hand-engineered features

– with recent progress in end-to-end deep learning, we hope to learn a representation better suited for few-shot learning

Slide Credit: Hugo Larochelle

slide-15
SLIDE 15

Prior Meta-Learning Methods

(C) Dhruv Batra & Zsolt Kira 15

  • Early work on learning an update rule

– Learning a synaptic learning rule (1990) Yoshua Bengio, Samy

Bengio, and Jocelyn Cloutier

– The Evolution of Learning: An Experiment in Genetic Connectionism (1990) David Chalmers – On the search for new learning rules for ANNs (1995) Samy

Bengio, Yoshua Bengio, and Jocelyn Cloutier

  • Early work on recurrent networks modifying their

weights

– Learning to control fast-weight memories: An alternative to dynamic recurrent networks (1992) Jürgen Schmidhuber – A neural network that embeds its own meta-levels (1993)

Jürgen Schmidhuber

Slide Credit: Hugo Larochelle

slide-16
SLIDE 16

Related Work: Meta-Learning

(C) Dhruv Batra & Zsolt Kira 16

  • Training a recurrent neural network to optimize

– outputs update, so can decide to do something else than gradient descent

  • Learning to learn by gradient descent by gradient descent

(2016) Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W.

Hoffman, David Pfau, Tom Schaul, and Nando de Freitas

  • Learning to learn using gradient descent (2001) Sepp Hochreiter, A.

Steven Younger, and Peter R. Conwell

Slide Credit: Hugo Larochelle

slide-17
SLIDE 17

Related Work: Meta-Learning

(C) Dhruv Batra & Zsolt Kira 17

  • Hyper-parameter optimization

– idea of learning the learning rates and the initialization conditions

  • Gradient-based hyperparameter optimization through

reversible learning (2015) Dougal Maclourin, David Duvenaud,

and Ryan P. Adams

Slide Credit: Hugo Larochelle

slide-18
SLIDE 18

Related Work: Meta-Learning

(C) Dhruv Batra & Zsolt Kira 18

  • AutoML (Bayesian optimization, reinforcement

learning)

  • Neural Architecture Search with Reinforcement

Learning (2017) Barret Zoph and Quoc Le

Slide Credit: Hugo Larochelle

slide-19
SLIDE 19

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 19

  • Learning algorithm A

– input: training set –

  • utput: parameters 𝜾 model M (the learner)

  • bjective: good performance on test set
  • Meta-learning algorithm

– input: meta-training set

  • f episodes

  • utput: parameters 𝝞 algorithm A (the meta-learner)

  • bjective: good performance on meta-test set

Slide Credit: Hugo Larochelle

slide-20
SLIDE 20

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 20

Slide Credit: Hugo Larochelle

slide-21
SLIDE 21

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 21

Slide Credit: Hugo Larochelle

slide-22
SLIDE 22

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 22

Slide Credit: Hugo Larochelle

slide-23
SLIDE 23

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 23

Slide Credit: Hugo Larochelle

slide-24
SLIDE 24

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 24

Slide Credit: Hugo Larochelle

slide-25
SLIDE 25

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 25

Slide Credit: Hugo Larochelle

slide-26
SLIDE 26

Meta-Learning

(C) Dhruv Batra & Zsolt Kira 26

Slide Credit: Hugo Larochelle

slide-27
SLIDE 27

Meta-Learning Nomenclature

(C) Dhruv Batra & Zsolt Kira 27

Slide Credit: Hugo Larochelle

slide-28
SLIDE 28

Meta-Learning Nomenclature

  • Assuming a probabilistic model M over labels, the

cost per episode can become

  • Depending on the choice of meta-learner,

will take a different form

(C) Dhruv Batra & Zsolt Kira 28

Slide Credit: Hugo Larochelle

slide-29
SLIDE 29

Meta-Learner

  • How to parametrize learning algorithms?
  • Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

  • kNN/kernel machine: Matching networks (Vinyals et al. 2016)
  • Gaussian classifier: Prototypical Networks (Snell et al. 2017)
  • Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

  • MANN (Santoro et al. 2016)
  • SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 29

Slide Credit: Hugo Larochelle

slide-30
SLIDE 30

Meta-Learner

  • How to parametrize learning algorithms?
  • Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

  • kNN/kernel machine: Matching networks (Vinyals et al. 2016)
  • Gaussian classifier: Prototypical Networks (Snell et al. 2017)
  • Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

  • MANN (Santoro et al. 2016)
  • SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 30

Slide Credit: Hugo Larochelle

slide-31
SLIDE 31

Matching Networks

(C) Dhruv Batra & Zsolt Kira 31

Slide Credit: Hugo Larochelle

slide-32
SLIDE 32

Prototypical Networks

(C) Dhruv Batra & Zsolt Kira 32

Slide Credit: Hugo Larochelle

slide-33
SLIDE 33

Prototypical Networks

(C) Dhruv Batra & Zsolt Kira 33

Slide Credit: Hugo Larochelle

slide-34
SLIDE 34

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 34

Slide Credit: Hugo Larochelle

slide-35
SLIDE 35

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 35

Slide Credit: Hugo Larochelle

slide-36
SLIDE 36

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 36

Slide Credit: Hugo Larochelle

slide-37
SLIDE 37

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 37

Slide Credit: Hugo Larochelle

slide-38
SLIDE 38

Meta-Learning Algorithm

(C) Dhruv Batra & Zsolt Kira 38

slide-39
SLIDE 39

Meta-Learner LSTM

(C) Dhruv Batra & Zsolt Kira 39

Slide Credit: Hugo Larochelle

slide-40
SLIDE 40

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 40

Slide Credit: Hugo Larochelle

slide-41
SLIDE 41

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 41

Slide Credit: Sergey Levine

slide-42
SLIDE 42

Model-Agnostic Meta-Learning (MAML)

(C) Dhruv Batra & Zsolt Kira 42

Slide Credit: Sergey Levine

slide-43
SLIDE 43

Comparison

(C) Dhruv Batra & Zsolt Kira 43

Slide Credit: Sergey Levine

slide-44
SLIDE 44

Meta-Learner

  • How to parametrize learning algorithms?
  • Two approaches to defining a meta-learner

– Take inspiration from a known learning algorithm

  • kNN/kernel machine: Matching networks (Vinyals et al. 2016)
  • Gaussian classifier: Prototypical Networks (Snell et al. 2017)
  • Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) ,

MAML (Finn et al. 2017)

– Derive it from a black box neural network

  • MANN (Santoro et al. 2016)
  • SNAIL (Mishra et al. 2018)

(C) Dhruv Batra & Zsolt Kira 44

Slide Credit: Hugo Larochelle

slide-45
SLIDE 45

Black-Box Meta-Learner

(C) Dhruv Batra & Zsolt Kira 45

Slide Credit: Hugo Larochelle

slide-46
SLIDE 46

Memory-Augmented Neural Network

(C) Dhruv Batra & Zsolt Kira 46

Slide Credit: Hugo Larochelle

slide-47
SLIDE 47

Experiments

(C) Dhruv Batra & Zsolt Kira 47

Slide Credit: Hugo Larochelle

slide-48
SLIDE 48

Experiments

(C) Dhruv Batra & Zsolt Kira 48

Slide Credit: Hugo Larochelle

slide-49
SLIDE 49

Extensions and Variations

(C) Dhruv Batra & Zsolt Kira 49

Slide Credit: Hugo Larochelle

slide-50
SLIDE 50

But beware

(C) Dhruv Batra & Zsolt Kira 50

Slide Credit: Hugo Larochelle A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

slide-51
SLIDE 51

(C) Dhruv Batra & Zsolt Kira 51

slide-52
SLIDE 52

Distribution Shift

  • What if there is a

distribution shift (cross- domain)?

  • Lesson: Methods that are

successful within-domain might be worse across domains!

(C) Dhruv Batra & Zsolt Kira 52

slide-53
SLIDE 53

Distribution Shift

(C) Dhruv Batra & Zsolt Kira 53

slide-54
SLIDE 54

Random Task Proposals

(C) Dhruv Batra & Zsolt Kira 54

slide-55
SLIDE 55

Does it Work?

(C) Dhruv Batra & Zsolt Kira 55

slide-56
SLIDE 56

Discussions

(C) Dhruv Batra & Zsolt Kira 56

  • What is the right definition of distributions over

problems?

– varying number of classes / examples per class (meta- training vs. meta-testing) ? – semantic differences between meta-training vs. meta-testing classes ? – overlap in meta-training vs. meta-testing classes (see recent “low-shot” literature) ?

  • Move from static to interactive learning

– how should this impact how we generate episodes ? – meta-active learning ? (few successes so far)

Slide Credit: Hugo Larochelle