Meta-Learning of Structured Representation by Proximal Mapping Mao - - PowerPoint PPT Presentation

meta learning of structured representation by proximal
SMART_READER_LITE
LIVE PREVIEW

Meta-Learning of Structured Representation by Proximal Mapping Mao - - PowerPoint PPT Presentation

Meta-Learning of Structured Representation by Proximal Mapping Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago Motivation Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of


slide-1
SLIDE 1

Meta-Learning of Structured Representation by Proximal Mapping

Mao Li, Yingyi Ma, Xinhua Zhang University of Illinois at Chicago

slide-2
SLIDE 2

Motivation

Goal of meta-learning: Extract prior structures from a set of tasks that allows efficient learning of new tasks. Examples of structural regularities:

  • Instance level
  • Input layers: transformation beyond group-based diffeomorphism
  • Within layers: sparsity, disentanglement, spatial invariance,

structured gradient accounting for data covariance, manifold smoothness

  • Between layers: equvariance, contractivity, robustness under dropout

and adversarial perturbations of preceding nodes

  • Batch/Dataset level
  • multi-view, multi-modality, multi-domain
  • diversity, fairness, privacy, causal structure
slide-3
SLIDE 3

Existing Approaches

  • riginal data

training data

augmented data

  • Data Augmentation

√ boost prediction performance × unclear the improvement is due to the learned representation

  • r due to a better classifier.
slide-4
SLIDE 4

Existing Approaches

Input Reconstruction latent representation encoder decoder downstream Tasks label

  • Auto-encoder

√ learned the most salient features × usually used as an initialization for subsequent supervised task × not amendable to end-to-end learning

Our goal: learn representations that explicitly encode structural priors in an end-to-end fashion.

slide-5
SLIDE 5

Existing Approaches

  • Regularization

√ simple and efficient × contention of weights between regularizer and supervised performance

slide-6
SLIDE 6

Proposed Method

Morph a representation z towards a structured one by proximal mapping:

promote desired structure Advantages

+decoupling the regularization and supervised learning +extend meta-learning to unsupervised base learners

Embed the proximal mapping as a layer into deep networks

z: mini-batch or single-example a mini-batch proximal mapping a task in meta-learning task-specific base learner

slide-7
SLIDE 7

Proposed Method

Morph a representation z towards a structured one by proximal mapping:

promote desired structure L: graph-Laplacian (for smoothness on manifold) Before After

slide-8
SLIDE 8

MetaProx for Multi-view Learning

In multiview learning, observations are available as pairs of views: {xi, yi}. Figure 1: training framework of MetaProx

view x view y view x view y

proximal map

proximal layer

supervised predictor h

view x label view y label

supervised predictor h

view y features g feature extractor g features f feature extractor f view x

slide-9
SLIDE 9

MetaProx for Multi-view Learning

feature extraction:

① ①

view x view y view x view y

proximal map

proximal layer

supervised predictor h

view x label view y label

supervised predictor h

view y features g feature extractor g features f feature extractor f view x

slide-10
SLIDE 10

MetaProx for Multi-view Learning

proximal mapping: promote high correlation between two views

view x view y view x view y

proximal map

proximal layer

supervised predictor h

view x label view y label

supervised predictor h

view y features g feature extractor g features f feature extractor f view x

② ②

slide-11
SLIDE 11

MetaProx for Multi-view Learning

view x view y view x view y

proximal map

proximal layer

supervised predictor h

view x label view y label

supervised predictor h

view y features g feature extractor g features f feature extractor f view x

supervised task

③ h: supervised predictor

slide-12
SLIDE 12

MetaProx for Multi-view Learning

view x view y view x view y

proximal map

proximal layer

supervised predictor h

view x label view y label

supervised predictor h

view y features g feature extractor g features f feature extractor f view x

supervised task

  • ptimize over red variables
slide-13
SLIDE 13

Experiment Results

Multi-view image classification

  • Dataset: a subset of Sketchy (20 classes)

Test accuracy for image classification

{ ; … …; } ( , ),’butterfly’ ( , ),’cat’

slide-14
SLIDE 14

Experiment Results

Crosslingual word embedding

  • Dataset: WS353, SimLex999
  • Metric: Spearman’

s correlation between the rankings by model and human Table 1: Spearman’ s correlation for word similarities

(English, German) word 1 word 2 word n

. . . . . .

slide-15
SLIDE 15

At the poster: More details and discussions Thanks! MetaProx

“Efficient Meta Learning via Minibatch Proximal Update” (NeurIPS 2019) “Meta-Learning with Implicit Gradients” (NeurIPS 2019)

modeling

  • ptimization