Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due - PowerPoint PPT Presentation

Non-Parametric Few-Shot Learning CS 330 1

Logistics Homework 1 due tonight, Homework 2 out soon Fill out project group form if you haven’t already. Project suggestions & project spreadsheet posted 2

Plan for Today Non-Parametric Few-Shot Learning - Siamese networks, matching networks, prototypical networks - Case study of few-shot medical image diagnosis Properties of Meta-Learning Algorithms - Comparison of approaches Example Meta-Learning Applications - Imitation learning, drug discovery, motion prediction, language generation Goals for by the end of lecture : - Basics of non-parametric few-shot learning techniques (& how to implement) - Trade-o ff s between black-box, optimization-based, and non-parametric meta-learning - Familiarity with applied formulations of meta-learning 3

Recap: Black-Box Meta-Learning φ i f θ 4 y ts x ts 0 1 2 3 4 D tr i Key idea: parametrize learner as a neural network - challenging op0miza0on problem + expressive

Recap: Op9miza9on-Based Meta-Learning φ i r θ L 4 y ts x ts 0 1 2 3 4 D tr i Key idea: embed op3miza3on inside the inner learning process + structure of op0miza0on - typically requires second-order op0miza0on embedded into meta-learner Today: Can we embed a learning procedure without a second-order op9miza9on?

So far : Learning parametric models. In low data regimes, non-parametric methods are simple, work well. During meta-test 0me : few-shot learning <-> low data regime During meta-training : s9ll want to be parametric Can we use parametric meta-learners that produce effec9ve non-parametric learners ? Note: some of these methods precede parametric approaches 6

Non-parametric methods Key Idea : Use non-parametric learner. test datapoint training data D tr i Compare test image with training images In what space do you compare? With what distance metric? pixel space, l 2 distance? 7

In what space do you compare? With what distance metric? pixel space, l 2 distance? Zhang et al. (arXiv 1801.03924) 8

Non-parametric methods Key Idea : Use non-parametric learner. test datapoint training data D tr i Compare test image with training images In what space do you compare? With what distance metric? pixel space, l 2 distance? pixel space, l 2 distance? Learn to compare using meta-training data! 9

Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label 0 Koch et al., ICML ‘15 10

Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label label 1 D tr Meta-test 9me: compare image to each image in j Meta-training : Binary classifica9on Can we match meta-train & meta-test? Meta-test : N-way classifica9on Koch et al., ICML ‘15 13

Non-parametric methods Key Idea : Use non-parametric learner. Can we match meta-train & meta-test? Nearest neighbors in learned embedding space D tr i bidirec9onal f θ ( x ts , x k ) y ) y k LSTM y ts = X f θ ( x ts , x k ) y k e ˆ x k ,y k ∈ D tr convolu9onal Trained end-to-end . encoder Meta-train & meta-test 9me match . D ts Vinyals et al. Matching Networks, NeurIPS ‘16 14 i

Non-parametric methods Key Idea : Use non-parametric learner. General Algorithm : Black-box approach Non-parametric approach (matching networks) 1. Sample task T i (or mini batch of tasks) 2. Sample disjoint datasets D tr i , D test from D i i (Parameters integrated ϕ y ts = X f θ ( x ts , x k ) y k Compute ˆ 3. Compute φ i ← f θ ( D tr i ) out, hence non-parametric ) x k ,y k ∈ D tr 4. Update θ using r θ L ( φ i , D test y ts , y ts ) ) Update θ using r θ L (ˆ i Matching networks will perform comparisons independently What if >1 shot ? Can we aggregate class informa9on to create a prototypical embedding ? 15

Non-parametric methods Key Idea : Use non-parametric learner. c n = 1 X ( y = n ) f θ ( x ) K ( x,y ) ∈ D tr i exp( − d ( f θ ( x ) , c n )) p θ ( y = n | x ) = P n 0 exp( d ( f θ ( x ) , c n 0 )) d: Euclidean, or cosine distance Snell et al. Prototypical Networks, NeurIPS ‘17 16

Non-parametric methods So far : Siamese networks, matching networks, prototypical networks Embed, then nearest neighbors. Challenge What if you need to reason about more complex rela9onships between datapoints? Idea : Learn non-linear rela9on Idea : Learn infinite Idea : Perform message module on embeddings mixture of prototypes. passing on embeddings (learn d in PN) Sung et al. Rela9on Net Allen et al. IMP, ICML ‘19 Garcia & Bruna, GNN 17

Case Study Machine Learning for Healthcare Conference 2019 NeurIPS 2018 ML4H Workshop Link: h^ps://arxiv.org/abs/1811.03066

Problem: Few-Shot Learning for Dermatological Disease Diagnosis Dermnet dataset (h^p://www.dermnet.com/) - hard to get data Challenges : - data is long-tailed - significant intra-class variability A cquire accurate Goal : classifier on all classes (Top 200 classes only!)

Prototypical Clustering Networks for Few-Shot Classifica3on Approach: Prototypical Networks + Problem formula0on : - learn mul3ple prototypes per class (to different image classes = different diseases handle intra-class variability) 150 base classes (classes w/ most data) - incorporate unlabeled support examples via 50 novel classes k-means on learned embedding Test on all 200 classes. Note : Unlike black-box & op9miza9on-based meta-learning, ProtoNets can train for N way classifica9on and test for > N way classifica9on (Side note if you read the paper: They flipped the standard nota3on of K and N in the paper)

Evalua9on Compare : PN - standard ProtoNets, trained on 150 base classes, pre-trained on ImageNet FT N -*NN - ImageNet pre-training, fine-tuned ResNet on N classes, *-nearest neighbors in resul9ng embedding space FT 200 -*CE - ImageNet pre-trained, fine-tuned on all 200 classes with balancing (very strong baseline, accesses more info during training, requires re-training for new classes) Evalua0on Metric : mean class accuracy (mca), i.e. average of per-class accuracies across 200 classes. k = 5 k = 10 PCN > PN PCN > FT N -*NN PCN ≈ FT 200 -*CE without requiring re-training More visualiza9ons and analysis in the paper!

Plan for Today Non-Parametric Few-Shot Learning - Siamese networks, matching networks, prototypical networks - Case study of few-shot medical image diagnosis Properties of Meta-Learning Algorithms - Comparison of approaches Example Meta-Learning Applications - Imitation learning, drug discovery, motion prediction, language generation How can we think about how these methods compare? 22

Black-box vs. Op9miza9on vs. Non-Parametric Computa(on graph perspec0ve Black-box Op0miza0on-based Non-parametric y ts = f θ ( D tr i , x ts ) � f θ ( x ts ) , c n � = softmax( − d ) y ts where c n = 1 X ( y = n ) f θ ( x ) K ( x,y ) ∈ D tr x ts i Note: (again) Can mix & match components of computa9on graph Gradient descent on rela9on net embedding. Both condi9on on data & MAML, but ini9alize last layer as run gradient descent. ProtoNet during meta-training Jiang et al. CAML ‘19 Triantafillou et al. Proto-MAML ‘19 Rusu et al. LEO ‘19 23

Black-box vs. Op9miza9on vs. Non-Parametric Algorithmic proper(es perspec0ve the ability for f to represent a range of learning procedures Expressive power Why? scalability, applicability to a range of domains learned learning procedure will monotonically improve with more data Consistency reduce reliance on meta-training tasks, Why? good OOD task performance Recall: These proper9es are important for most applica9ons! 24

Black-box vs. Op9miza9on vs. Non-Parametric Black-box Op9miza9on-based Non-parametric + complete expressive power + consistent , reduces to GD + expressive for most architectures ~ consistent under certain - not consistent ~ expressive for very deep models * condi0ons + easy to combine with variety of + posi0ve induc0ve bias at the start + en9rely feedforward of meta-learning learning problems (e.g. SL, RL) + computa0onally fast & easy to + handles varying & large K well op0mize - challenging op0miza0on (no + model-agnos0c - harder to generalize to varying K induc9ve bias at the ini9aliza9on) - second-order op0miza0on - ojen data-inefficient - hard to scale to very large K - usually compute and memory - so far, limited to classifica0on intensive Generally, well-tuned versions of each perform comparably on exis9ng few-shot benchmarks! (likely says more about the benchmarks than the methods) Which method to use depends on your use-case . *for supervised learning sekngs 25

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due - PowerPoint PPT Presentation

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon Fill out project group form if you havent already. Project suggestions & project spreadsheet posted 2 Plan for Today Non-Parametric Few-Shot

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Infinite Mixture Prototypes for Few-Shot Learning Adaptively inferring model capacity for simple

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Optimization-Based Meta-Learning ( fi nishing from last time) and Non-Parametric Few-Shot

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric

Concepts with Few-shot Supervision Xuming He ShanghaiTech University

Zero-Shot Learning for Word Translation: Successes and Failures Ndapa Nakashole, University of

A Baseline for Few-Shot Image Classification Guneet S. Dhillon 1 , Pratik Chaudhari 2 , Avinash

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

POST-NEWTONIAN METHODS AND APPLICATIONS Luc Blanchet Gravitation et Cosmologie ( G R C O )

Black-hole simulations on supercomputers U. Sperhake DAMTP , University of Cambridge DAMTP ,

CS 225 Data Structures Oc October 16 AV AVL Applications G G Carl Evans AV AVL Tree

Fundamental Principle of Counting Theorem 1 (Fundamental Principle of Counting) . If we have to

C/a)6) fru) - c&C iAV,,y t, I 'LLaaP'xa,,q& r'.,ti,,{.r1: : {r"' /4 i ri1 #,s .

Red/Black Trees Mark Redekopp David Kempe 2 An example of B-Trees 2-3 TREES 3 Definition

Lecture 7 The Five Basic Discrete Random Variables In this lecture we define and study the five

Searching for Gravitational Waves from Binary Inspirals with LIGO Duncan Brown University of

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due - PowerPoint PPT Presentation

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon Fill out project group form if you havent already. Project suggestions & project spreadsheet posted 2 Plan for Today Non-Parametric Few-Shot

MLSE Channel Estimation MLSE Channel Estimation MLSE Channel Estimation Parametric or Non-

Semi-parametric and response setup non-parametric approaches to Parametric models

Infinite Mixture Prototypes for Few-Shot Learning Adaptively inferring model capacity for simple

SHOT Brand Price NOTES WEST COAST MAGNUM SIZES 4 - 9 $ 39.20 Eagle shot prices may not be

Non-parametric Bayesian Statistics Graham Neubig 2011-12-22 1 Graham Neubig Non-parametric

Optimization-Based Meta-Learning ( fi nishing from last time) and Non-Parametric Few-Shot

Introduction to non-parametric Bayes Introduction to non-parametric Bayes methods 1 Overview

Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric

Concepts with Few-shot Supervision Xuming He ShanghaiTech University

Zero-Shot Learning for Word Translation: Successes and Failures Ndapa Nakashole, University of

A Baseline for Few-Shot Image Classification Guneet S. Dhillon 1 , Pratik Chaudhari 2 , Avinash

Siamese Network &amp; Matching Network for one-shot learning Reference Papers Siamese Neural

Towards a non-parametric Towards a non-parametric stochastic framework: a consistent approach of

Non parametric prediction and mapping of standing Non-parametric prediction and mapping of

A Bayesian Approach to A Bayesian Approach to Unsupervised One- Unsupervised One -Shot Shot

Learning From Data Lecture 18 Radial Basis Functions Non-Parametric RBF Parametric RBF k

POST-NEWTONIAN METHODS AND APPLICATIONS Luc Blanchet Gravitation et Cosmologie ( G R C O )

Black-hole simulations on supercomputers U. Sperhake DAMTP , University of Cambridge DAMTP ,

CS 225 Data Structures Oc October 16 AV AVL Applications G G Carl Evans AV AVL Tree

Fundamental Principle of Counting Theorem 1 (Fundamental Principle of Counting) . If we have to

C*/a)6) fru) - c*&amp;C iAV,,y t, I 'LLaaP*'xa,,*q&amp; r'.,ti,,{.r1: : {r&quot;' /4 i ri1 #,s .

Red/Black Trees Mark Redekopp David Kempe 2 An example of B-Trees 2-3 TREES 3 Definition

Lecture 7 The Five Basic Discrete Random Variables In this lecture we define and study the five

Searching for Gravitational Waves from Binary Inspirals with LIGO Duncan Brown University of

Siamese Network & Matching Network for one-shot learning Reference Papers Siamese Neural

C/a)6) fru) - c&C iAV,,y t, I 'LLaaP'xa,,q& r'.,ti,,{.r1: : {r"' /4 i ri1 #,s .