Meta-Learning Unsupervised Update Rules Paper by Luke Metz, Niru - - PowerPoint PPT Presentation

meta learning unsupervised update rules
SMART_READER_LITE
LIVE PREVIEW

Meta-Learning Unsupervised Update Rules Paper by Luke Metz, Niru - - PowerPoint PPT Presentation

Meta-Learning Unsupervised Update Rules Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein Outline Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Experimental Results


slide-1
SLIDE 1

Meta-Learning Unsupervised Update Rules

Paper by Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

slide-2
SLIDE 2

Outline

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Experimental Results Critiques

slide-3
SLIDE 3

Motivation

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

Unsupervised learning enables representation learning on mountains on unlabeled data for downstream tasks

slide-4
SLIDE 4

Motivation

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

Unsupervised learning enables representation learning on mountains of unlabeled data for downstream tasks. Unsupervised Learning Rules

  • VAE: Severe overfitting to training space.
  • GANs: Great for images, weak on discrete data (ex. text).
  • Both: Learning rule not unsupervised (ex. surrogate loss).
slide-5
SLIDE 5

Motivation

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

Unsupervised learning enables representation learning on mountains of unlabeled data for downstream tasks Unsupervised Learning Rules

  • VAE: Severe overfitting to training space.
  • GANs: Great for images, weak on discrete data (ex. text).
  • Both: Learning rule not unsupervised (ex. surrogate loss).

Question: Can we meta-learn an unsupervised learning rule?

slide-6
SLIDE 6

Apply encoder to get compact vector

Semi-Supervised Few-Shot Classification

x1 x2 x3 x4

Labeled train Fit Model

Apply unsupervised rule to tune encoder

x1 x2 x3 x4

Unlabeled train

x5

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

y1 y2 y3 y4

slide-7
SLIDE 7

Semi-Supervised Few-Shot Classification

Apply encoder to get compact vector

x1 x2 x3 x4

Labeled train Fit Model

Apply unsupervised rule to tune encoder

x1 x2 x3 x4

Unlabeled train

x5

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

y1 y2 y3 y4

Can we meta-learn this unsupervised learning rule?

slide-8
SLIDE 8

Learning the Learning Rule

Backpropagation: Unsupervised Update:

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-9
SLIDE 9

Method Overview

Outer loop

  • Optimize meta-objective:

Inner loop

  • Learn encoder using unsupervised update rule.

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-10
SLIDE 10

Meta-Learning Setup

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-11
SLIDE 11

Meta-Learning Setup

Inner loop applies an unsupervised learning

  • alg. on unlabeled data

Outer loop evaluates unsupervised learning

  • alg. using labeled data

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-12
SLIDE 12

Meta-Learning Setup

Inner loop applies an unsupervised learning

  • alg. on unlabeled data

Outer loop evaluates unsupervised learning

  • alg. using labeled data

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-13
SLIDE 13

Inner Loop

Question: Given a base model, g(x; ɸ), which encodes inputs into compact vectors, how do we learn its parameters ɸ to give useful features?

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-14
SLIDE 14

Inner Loop

Question: Given a base model, g(x; ɸ), which encodes inputs into compact vectors, how do we learn its parameters ɸ to give useful features? Idea: What if we use another neural network to generate a neuron-specific error signal? Then we can learn its parameters θ (the meta-parameters) to produce useful error signals

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-15
SLIDE 15

Inner Loop: Forward Pass

1) Take an input 2) Generate intermediate activations 3) Produce a feature representation

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-16
SLIDE 16

Inner Loop: Generate Error Signal

1) Input each layer’s activation through an MLP 2) Output error vector

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-17
SLIDE 17

Inner Loop: Backward Pass

1) Initialize top-level error with output of MLP 2) Backprop the error 3) Linearly combine

  • utput from MLP with

backpropagated error

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-18
SLIDE 18

Inner Loop: Update 𝝔

𝝔 consists of all base model parameters Wi, Vi, and bi Updates like ΔWi, ΔVi are linear* functions

  • f local error

quantities hi-1 and hi

*There are also nonlinear normalizations within this function

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-19
SLIDE 19

Inner Loop: Key Points

  • Error generating network replicates the mechanics
  • f backprop for unsupervised learning
  • An iterative updates tune 𝝔 for some higher-level
  • bjective
  • Outer loop sets objective by modifying the error

generating function

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-20
SLIDE 20

Inner Loop: Key Points

  • Error generating network replicates the mechanics
  • f backprop for unsupervised learning
  • An iterative updates tune 𝝔 for some higher-level
  • bjective
  • Outer loop sets objective by modifying the error

generating function

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-21
SLIDE 21

Inner Loop: Key Points

  • Error generating network replicates the mechanics
  • f backprop for unsupervised learning
  • An iterative updates tune 𝝔 for some higher-level
  • bjective
  • Outer loop sets objective by modifying the error

generating function

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-22
SLIDE 22

Inner Loop: Key Points

  • Error generating network replicates the mechanics
  • f backprop for unsupervised learning
  • An iterative updates tune 𝝔 for some higher-level
  • bjective
  • Outer loop sets objective by modifying the error

generating function

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-23
SLIDE 23

Outer Loop

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-24
SLIDE 24

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support MS Error Labeled query

x*

1

x*

2

Fit Linear Model Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

y1 y2 y3 y4 y*

1

y*

2

slide-25
SLIDE 25

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support MS Error Labeled query

x*

1

x*

2

Fit Linear Model Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

x1 x2 x3 x4 x*

1

x*

2

slide-26
SLIDE 26

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support MS Error Labeled query

x*

1

x*

2

Fit Linear Model Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

x1 x2 x3 x4 x*

1

x*

2

Backprop all the way back to θ

slide-27
SLIDE 27

Outer Loop: Compute MetaObjective

Apply encoder

x1 x2 x3 x4

Labeled support MS Error Labeled query

x*

1

x*

2

Fit Linear Model Evaluate Model

Apply Unsupervised Ruleθ to tune Encoder

x1 x2 x3 x4

Unlabeled support

x5

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

x1 x2 x3 x4 x*

1

x*

2

Backprop all the way back to θ

Truncated backprop

slide-28
SLIDE 28

Results

Training Data: CIFAR10 & Imagenet.

  • Generalization over datasets.
  • Generalization over domains
  • Generalization over network architectures

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-29
SLIDE 29

Results: Generalization over datasets

What’s going on?

  • Evaluation of unsupervised learning

rule on different datasets

  • Comparison to other methods.

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-30
SLIDE 30

Results: Generalization over Domains

What’s going on? Evaluation of unsupervised learning rule on 2-way text classification. 30h vs 200h of meta-training.

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-31
SLIDE 31

Results: Generalization over Networks

What’s going on? Evaluation of unsupervised learning rule on different network architectures.

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-32
SLIDE 32

Critiques: Limitations

Computationally expensive. 8 days, 512 workers. Many, many tricks. Lack of ablative analysis.

  • Reproducibility. # labeled examples? # unlabeled?

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques

slide-33
SLIDE 33

Critiques: Suggestions

Ablative analysis Implicit MAML? Investigate generalization to CNN and attention-based models. Better way to encode learning rule? Is this architecture expressive?

Motivation Problem Breakdown Method Overview Meta-Learning Setup Inner Loop Outer Loop Results Critiques