Learning to Learn Kernels with Variational Random Features - - PowerPoint PPT Presentation

learning to learn kernels with variational random features
SMART_READER_LITE
LIVE PREVIEW

Learning to Learn Kernels with Variational Random Features - - PowerPoint PPT Presentation

Learning to Learn Kernels with Variational Random Features Presenter : Haoliang Sun Xiantong Zhen*, Haoliang Sun*, Yingjun Du*, Jun Xu, Yilong Yin, Ling Shao, Cees Snoek ICML | 2020 Meta-Learning (Leaning to Learn) D1 D2 D3 D Datasets


slide-1
SLIDE 1

Learning to Learn Kernels with Variational Random Features

Xiantong Zhen*, Haoliang Sun*, Yingjun Du*, Jun Xu, Yilong Yin, Ling Shao, Cees Snoek

Presenter : Haoliang Sun

ICML | 2020

slide-2
SLIDE 2

Meta-Learning (Leaning to Learn)

ICML | 2020

t1

… …

t2 t3

D1 D2 D3

Tasks Datasets t’

D’

Ø Extract prior (meta) knowledge from related tasks (meta learner) Ø Fast adaptation to a new task (base learner)

Ø Good parameter initialization (Finn et al., 2017) Ø Efficient optimization update rules (Ravi et al., 2017) Ø General feature extractors (Vinyals et al., 2016) ...

Meta Knowledge:

Meta Learner

Base Learner1 Base Learner2 Base Learner3 new Learner

Meta-Learning.

slide-3
SLIDE 3

Few-Shot Learning (FSL) with Meta-Learning (ML)

ICML | 2020

Ø The episodic training-testing strategy

  • - meta-training: a meta-learner is trained to enhance base-learners’ performance
  • n the meta-training set with a batch of few-shot learning tasks
  • - meta-testing: base-learners are evaluated on the meta-test set with novel

categories of data Ø An episode (task)

  • - sample 𝐷-way 𝑙-shot classification tasks from the meta-training (testing) set
  • - 𝑙 is the number of labelled examples for each of the 𝐷 classes
slide-4
SLIDE 4

ICML | 2020

Example of few-shot learning setup (Ravi et al., 2017)

Few-Shot Learning (FSL) with Meta-Learning (ML)

Episode 1 Episode 2

slide-5
SLIDE 5

ICML | 2020

An Effective Meta-Learning Scenario

Ø Base-learner:

  • - be powerful to solve individual tasks
  • - be able to absorb common information

Ø Meta-learner:

  • - extract valid prior knowledge

Key idea:

Ø integrate kernel learning with random features and variational inference (VI) into the ML framework for FSL Ø formulate the optimization as a VI problem by deriving new ELBO Ø a context inference puts the inference of random bases of the current task into the context of all previous, related tasks

slide-6
SLIDE 6

Learning adaptive kernels with data-driven random Fourier features

ICML | 2020

Problem Statement

Meta-learning with kernels A practical base-learner (Kernel ridge regression)

The closed-form solution . The predictor . For task , support set , query set , predictor , base-learner , loss , mapping function , .

slide-7
SLIDE 7

ICML | 2020

Problem Statement

Random Fourier Features (RFFs)

Ø learn adaptive kernels in a data-driven way Ø leverage the shared knowledge by exploring dependencies among related tasks to generate rich features Ø construct approximate translation-invariant kernels using explicit feature maps via random bases (Bochner’s theorem)

Data-driven adaptive kernels is to find the posterior for random bases Formulated as a variational inference problem

slide-8
SLIDE 8

ICML | 2020

Meta Variational Random Features (MetaVRF)

Ø The posterior is intractable. Approximate it by using a meta variational distribution Ø The Evidence Lower Bound (ELBO) Ø The objective (maximizing ELBO w.r.t. tasks)

Variational distribution

ELBO

The objective function

slide-9
SLIDE 9

ICML | 2020

Context Inference

Ø generate rich random bases to build strong kernels Ø put the inference of bases

  • f the current task into the context of all previous,

related tasks Ø The context of related tasks

  • th task

x, St,

dependency C. W

The directed graphical model.

slide-10
SLIDE 10

ICML | 2020

An LSTM-Based Context Inference Network

Ø LSTM transformation with input of the support set and previous cell states Ø shared MLPs for inference

  • utputs the parameter of

the variational distribution Ø The optimization objective with the context inference

slide-11
SLIDE 11

ICML | 2020

slide-12
SLIDE 12

ICML | 2020

Experiments

Ø Few-Shot Regression

  • - Fitting a target sine function

Ø Few-Shot Classification

  • - Three benchmarks

Ø Further analysis

  • - Deep embedding
  • - Efficiency
  • - Versatility
slide-13
SLIDE 13
  • 5

5

  • 4

4

3-shot

1.913 1.072 0.722 0.700

  • 5

5

  • 3

3

5-shot

0.415 0.063 0.047 0.022

  • 5

5

  • 4

4

10-shot

0.294 0.024 0.009 0.003

ICML | 2020

Evaluation: Few-Shot Regression

Figure 1:

slide-14
SLIDE 14

ICML | 2020

Evaluation: Few-Shot Classification

slide-15
SLIDE 15

ICML | 2020

Evaluation: Few-Shot Classification

slide-16
SLIDE 16

ICML | 2020

Further Analysis

slide-17
SLIDE 17

ICML | 2020

Further Analysis

slide-18
SLIDE 18

ICML | 2020

Further Analysis

slide-19
SLIDE 19

ICML | 2020

Conclusion

v A novel meta-learning framework, MetaVRF, introducing RFFs into the meta-learning framework and leveraging VI to infer the spectral distribution in a data-driven way. v The LSTM-based context inference explores the shared knowledge and generates rich random features. v Achieve the state-of-the-art performance. v Learned kernels exhibit high representational power with a low spectral sampling rate. v Robustness and flexibility to a great variety of testing conditions.

slide-20
SLIDE 20

ICML | 2020