Optimization-Based Meta-Learning ( fi nishing from last time) and - PowerPoint PPT Presentation

Optimization-Based Meta-Learning ( fi nishing from last time) and Non-Parametric Few-Shot Learning CS 330 1

Logistics Homework 1 due, Homework 2 out this Wednesday Fill out poster presentation preferences ! (Tues 12/3 or Weds 12/4) Course project details & suggestions posted   Proposal due Monday 10/28 � 2

Plan for Today Optimization-Based Meta-Learning - Recap & discuss advanced topics   Non-Parametric Few-Shot Learning - Siamese networks, matching networks, prototypical networks   Properties of Meta-Learning Algorithms - Comparison of approaches � 3

Recap from Last Time pre-trained parameters φ θ � α r θ L ( θ , D tr ) Fine-tuning training data   [test-&me] for new task X X L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr L ( θ � α r θ L ( θ , D tr i ) , D ts i ) , D ts min min min i ) , i ) i ) MAML θ θ θ task i task i i Op&mizes for an effec&ve ini&aliza&on for fine-tuning. Discussed : performance on extrapolated tasks, expressive power � 4

Probabilis.c Interpreta.on of Op.miza.on-Based Inference Key idea : Acquire through op&miza&on. Meta-parameters serve as a prior. One form of prior knowledge: ini.aliza.on for fine-tuning θ task-specific parameters (empirical Bayes) MAP es&mate How to compute MAP es.mate? Gradient descent with early stopping = MAP inference under meta-parameters Gaussian prior with mean at ini&al parameters [Santos ’96] (exact in linear case, approximate in nonlinear case) MAML approximates hierarchical Bayesian inference. Grant et al. ICLR ‘18 � 5

Op.miza.on-Based Inference Key idea : Acquire through op&miza&on. Meta-parameters serve as a prior. One form of prior knowledge: ini.aliza.on for fine-tuning θ φ θ � α r θ L ( θ , D tr ) Gradient-descent + early stopping (MAML): implicit Gaussian prior Other forms of priors? Gradient-descent with explicit Gaussian prior Rajeswaran et al. implicit MAML ‘19 Bayesian linear regression on learned features Harrison et al. ALPaCA ‘18 Closed-form or convex opBmizaBon on learned features ridge regression, logisBc regression s upport vector machine Ber&neYo et al. R2-D2 ‘19 Lee et al. MetaOptNet ‘19 Current SOTA on few-shot image classifica.on � 6

Op.miza.on-Based Inference Key idea : Acquire through op&miza&on. Challenges How to choose architecture that is effec&ve for inner gradient-step? Idea : Progressive neural architecture search + MAML (Kim et al. Auto-Meta) - finds highly non-standard architecture (deep & narrow) - different from architectures that work well for standard supervised learning MAML, basic architecture: 63.11% MiniImagenet, 5-way 5-shot MAML + AutoMeta: 74.65% � 7

Op.miza.on-Based Inference Key idea : Acquire through op&miza&on. Challenges Bi-level op&miza&on can exhibit instabili&es. Idea : Automa&cally learn inner vector learning rate, tune outer learning rate (Li et al. Meta-SGD, Behl et al. AlphaMAML) Idea : Op&mize only a subset of the parameters in the inner loop (Zhou et al. DEML, Zintgraf et al. CAVIA) Idea : Decouple inner learning rate, BN sta&s&cs per-step (Antoniou et al. MAML++) Idea : Introduce context variables for increased expressive power. (Finn et al. bias transforma&on, Zintgraf et al. CAVIA) Takeaway : a range of simple tricks that can help op&miza&on significantly � 8

Op.miza.on-Based Inference Key idea : Acquire through op&miza&on. Challenges Backpropaga&ng through many inner gradient steps is compute- & memory- intensive. Idea : [Crudely] approximate as iden&ty (Finn et al. first-order MAML ‘17, Nichol et al. Rep&le ’18) Takeaway : works for simple few-shot problems, but (anecdotally) not for more complex meta-learning problems. Can we compute the meta-gradient without differen-a-ng through the op-miza-on path ? -> whiteboard Idea : Derive meta-gradient using the implicit func&on theorem (Rajeswaran, Finn, Kakade, Levine. Implicit MAML ’19) � 9

Op.miza.on-Based Inference Can we compute the meta-gradient without differen-a-ng through the op-miza-on path ? Idea : Derive meta-gradient using the implicit func&on theorem (Rajeswaran, Finn, Kakade, Levine. Implicit MAML) Memory and computa.on trade-offs Allows for second-order op.mizers in inner loop A very recent development (NeurIPS ’19)   (thus, all the typical caveats with recent work) � 10

Op.miza.on-Based Inference Key idea : Acquire through op&miza&on. Takeaways : Construct bi-level op-miza-on problem. + posi&ve induc&ve bias at the start of meta-learning + consistent procedure, tends to extrapolate beYer + maximally expressive with sufficiently deep network + model-agnos&c (easy to combine with your favorite architecture) - typically requires second-order op&miza&on - usually compute and/or memory intensive Can we embed a learning procedure without a second-order op&miza&on? � 11

Plan for Today Optimization-Based Meta-Learning - Recap & discuss advanced topics   Non-Parametric Few-Shot Learning - Siamese networks, matching networks, prototypical networks   Properties of Meta-Learning Algorithms - Comparison of approaches � 12

So far : Learning parametric models. In low data regimes, non-parametric methods are simple, work well. During meta-test Bme : few-shot learning <-> low data regime During meta-training : s&ll want to be parametric Can we use parametric meta-learners that produce effec&ve non-parametric learners ? Note: some of these methods precede parametric approaches � 13

Non-parametric methods Key Idea : Use non-parametric learner. D tr test datapoint training data i Compare test image with training images In what space do you compare? With what distance metric? pixel space, l 2 distance? � 14

In what space do you compare? With what distance metric? pixel space, l 2 distance? Zhang et al. (arXiv 1801.03924) 15

Non-parametric methods Key Idea : Use non-parametric learner. D tr test datapoint training data i Compare test image with training images In what space do you compare? With what distance metric? pixel space, l 2 distance? pixel space, l 2 distance? Learn to compare using meta-training data! � 16

Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label 0 Koch et al., ICML ‘15 17

Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label label 1 D tr Meta-test &me: compare image to each image in j Meta-training : Binary classifica&on Can we match meta-train & meta-test? Meta-test : N-way classifica&on Koch et al., ICML ‘15 20

Non-parametric methods Key Idea : Use non-parametric learner. Can we match meta-train & meta-test? Nearest neighbors in learned embedding space D tr i bidirec.onal f θ ( x ts , x k ) y ) y k LSTM y ts = X f θ ( x ts , x k ) y k e ˆ x k ,y k ∈ D tr convolu.onal   Trained end-to-end . encoder Meta-train & meta-test &me match . D ts Vinyals et al. Matching Networks, NeurIPS ‘16 21 i

Non-parametric methods Key Idea : Use non-parametric learner. General Algorithm : Amor&zed approach Non-parametric approach (matching networks) 1. Sample task T i (or mini batch of tasks) 2. Sample disjoint datasets D tr i , D test from D i i (Parameters � integrated ϕ y ts = X f θ ( x ts , x k ) y k Compute ˆ 3. Compute φ i ← f θ ( D tr i ) out, hence non-parametric ) x k ,y k ∈ D tr 4. Update θ using r θ L ( φ i , D test y ts , y ts ) ) Update θ using r θ L (ˆ i Matching networks will perform comparisons independently What if >1 shot ? Can we aggregate class informa.on to create a prototypical embedding ? � 22

Non-parametric methods Key Idea : Use non-parametric learner. c n = 1 X ( y = n ) f θ ( x ) K ( x,y ) ∈ D tr i exp( − d ( f θ ( x ) , c n )) p θ ( y = n | x ) = P n 0 exp( d ( f θ ( x ) , c n 0 )) d: Euclidean, or cosine distance Snell et al. Prototypical Networks, NeurIPS ‘17 � 23

Non-parametric methods So far : Siamese networks, matching networks, prototypical networks Embed, then nearest neighbors. Challenge What if you need to reason about more complex rela&onships between datapoints? Idea : Learn non-linear rela&on Idea : Learn infinite Idea : Perform message module on embeddings mixture of prototypes. passing on embeddings (learn d in PN) Sung et al. Rela&on Net Allen et al. IMP, ICML ‘19 Garcia & Bruna, GNN � 24

Plan for Today Optimization-Based Meta-Learning - Recap & discuss advanced topics   Non-Parametric Few-Shot Learning - Siamese networks, matching networks, prototypical networks   Properties of Meta-Learning Algorithms - Comparison of approaches How can we think about how these methods compare? � 25

Optimization-Based Meta-Learning ( fi nishing from last time) and - PowerPoint PPT Presentation

Optimization-Based Meta-Learning ( fi nishing from last time) and Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due, Homework 2 out this Wednesday Fill out poster presentation preferences ! (Tues 12/3 or Weds 12/4) Course

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 due next Weds (9/30). Project

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

C/a)6) fru) - c&C iAV,,y t, I 'LLaaP'xa,,q& r'.,ti,,{.r1: : {r"' /4 i ri1 #,s .

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon

POST-NEWTONIAN METHODS AND APPLICATIONS Luc Blanchet Gravitation et Cosmologie ( G R C O )

Black-hole simulations on supercomputers U. Sperhake DAMTP , University of Cambridge DAMTP ,

Red/Black Trees Mark Redekopp David Kempe 2 An example of B-Trees 2-3 TREES 3 Definition

Lecture 7 The Five Basic Discrete Random Variables In this lecture we define and study the five

Searching for Gravitational Waves from Binary Inspirals with LIGO Duncan Brown University of

Conditional Probability & Independence Conditional Probabilities Question : How should we

Optimization-Based Meta-Learning ( fi nishing from last time) and - PowerPoint PPT Presentation

Optimization-Based Meta-Learning ( fi nishing from last time) and Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due, Homework 2 out this Wednesday Fill out poster presentation preferences ! (Tues 12/3 or Weds 12/4) Course

Meta- Meta -Programming with Programming with Modelica Modelica for Meta- for Meta

Bayesian Model-Agnostic Meta-Learning Taesup Kim* (presenter), Jaesik Yoon* Ousmane Dia,

META Seal of Recognition and META Prize Award Ceremony Georg Rehm (DFKI) on behalf of the

Meta Learning Shengchao Liu Background Meta Learning (AKA Learning to Learn) A

Optimization-Based Meta-Learning CS 330 1 Course Reminders HW1 due next Weds (9/30). Project

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng Ou Outline

A few meta learning papers Guy Gur-Ari Machine Learning Journal Club, September 2017 Meta

The Meta-Learning Problem &amp; Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

MetaFun: Meta-Learning with Iterative Functional Updates Jin Xu, Jean-Francois Ton, Hyunjik Kim,

Robust Deep Learning Based on Meta-learning Deyu Meng Xian Jiaotong University

Me Meta Lear Learnin ing A Bri Brief Introduct ction Xiachong Feng TG Ph.D. Student

15-780: Optimization J. Zico Kolter March 14-16, 2015 1 Outline Introduction to optimization

Intelligent Tutoring Systems: A Meta-Analysis Meta-Analysis Wenting Ma March, 2011

Company profile Capabilities Customers &amp; References META-LRA Kft. 8400 Ajka,

Individual Participant Data (IPD) Reviews and Meta analyses Lesley Stewart Director, CRD Larysa

Lecture 31/Chapter 25 More about Meta-Analysis Benefits and Pitfalls An Application:

C*/a)6) fru) - c*&amp;C iAV,,y t, I 'LLaaP*'xa,,*q&amp; r'.,ti,,{.r1: : {r&quot;' /4 i ri1 #,s .

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon

POST-NEWTONIAN METHODS AND APPLICATIONS Luc Blanchet Gravitation et Cosmologie ( G R C O )

Black-hole simulations on supercomputers U. Sperhake DAMTP , University of Cambridge DAMTP ,

Red/Black Trees Mark Redekopp David Kempe 2 An example of B-Trees 2-3 TREES 3 Definition

Lecture 7 The Five Basic Discrete Random Variables In this lecture we define and study the five

Searching for Gravitational Waves from Binary Inspirals with LIGO Duncan Brown University of

Conditional Probability &amp; Independence Conditional Probabilities Question : How should we

The Meta-Learning Problem & Black-Box Meta-Learning CS 330 Logistics Homework 1 posted today,

Company profile Capabilities Customers & References META-LRA Kft. 8400 Ajka,

C/a)6) fru) - c&C iAV,,y t, I 'LLaaP'xa,,q& r'.,ti,,{.r1: : {r"' /4 i ri1 #,s .

Conditional Probability & Independence Conditional Probabilities Question : How should we