non parametric few shot learning
play

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due - PowerPoint PPT Presentation

Non-Parametric Few-Shot Learning CS 330 1 Logistics Homework 1 due tonight, Homework 2 out soon Fill out project group form if you havent already. Project suggestions & project spreadsheet posted 2 Plan for Today Non-Parametric Few-Shot


  1. Non-Parametric Few-Shot Learning CS 330 1

  2. Logistics Homework 1 due tonight, Homework 2 out soon Fill out project group form if you haven’t already. Project suggestions & project spreadsheet posted 2

  3. Plan for Today Non-Parametric Few-Shot Learning - Siamese networks, matching networks, prototypical networks - Case study of few-shot medical image diagnosis Properties of Meta-Learning Algorithms - Comparison of approaches Example Meta-Learning Applications - Imitation learning, drug discovery, motion prediction, language generation Goals for by the end of lecture : - Basics of non-parametric few-shot learning techniques (& how to implement) - Trade-o ff s between black-box, optimization-based, and non-parametric meta-learning - Familiarity with applied formulations of meta-learning 3

  4. Recap: Black-Box Meta-Learning φ i f θ 4 y ts x ts 0 1 2 3 4 D tr i Key idea: parametrize learner as a neural network - challenging op0miza0on problem + expressive

  5. Recap: Op9miza9on-Based Meta-Learning φ i r θ L 4 y ts x ts 0 1 2 3 4 D tr i Key idea: embed op3miza3on inside the inner learning process + structure of op0miza0on - typically requires second-order op0miza0on embedded into meta-learner Today: Can we embed a learning procedure without a second-order op9miza9on?

  6. So far : Learning parametric models. In low data regimes, non-parametric methods are simple, work well. During meta-test 0me : few-shot learning <-> low data regime During meta-training : s9ll want to be parametric Can we use parametric meta-learners that produce effec9ve non-parametric learners ? Note: some of these methods precede parametric approaches 6

  7. Non-parametric methods Key Idea : Use non-parametric learner. test datapoint training data D tr i Compare test image with training images In what space do you compare? With what distance metric? pixel space, l 2 distance? 7

  8. In what space do you compare? With what distance metric? pixel space, l 2 distance? Zhang et al. (arXiv 1801.03924) 8

  9. Non-parametric methods Key Idea : Use non-parametric learner. test datapoint training data D tr i Compare test image with training images In what space do you compare? With what distance metric? pixel space, l 2 distance? pixel space, l 2 distance? Learn to compare using meta-training data! 9

  10. Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label 0 Koch et al., ICML ‘15 10

  11. Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label 1 Koch et al., ICML ‘15 11

  12. Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label 0 Koch et al., ICML ‘15 12

  13. Non-parametric methods Key Idea : Use non-parametric learner. train Siamese network to predict whether or not two images are the same class label label 1 D tr Meta-test 9me: compare image to each image in j Meta-training : Binary classifica9on Can we match meta-train & meta-test? Meta-test : N-way classifica9on Koch et al., ICML ‘15 13

  14. Non-parametric methods Key Idea : Use non-parametric learner. Can we match meta-train & meta-test? Nearest neighbors in learned embedding space D tr i bidirec9onal f θ ( x ts , x k ) y ) y k LSTM y ts = X f θ ( x ts , x k ) y k e ˆ x k ,y k ∈ D tr convolu9onal Trained end-to-end . encoder Meta-train & meta-test 9me match . D ts Vinyals et al. Matching Networks, NeurIPS ‘16 14 i

  15. Non-parametric methods Key Idea : Use non-parametric learner. General Algorithm : Black-box approach Non-parametric approach (matching networks) 1. Sample task T i (or mini batch of tasks) 2. Sample disjoint datasets D tr i , D test from D i i (Parameters integrated ϕ y ts = X f θ ( x ts , x k ) y k Compute ˆ 3. Compute φ i ← f θ ( D tr i ) out, hence non-parametric ) x k ,y k ∈ D tr 4. Update θ using r θ L ( φ i , D test y ts , y ts ) ) Update θ using r θ L (ˆ i Matching networks will perform comparisons independently What if >1 shot ? Can we aggregate class informa9on to create a prototypical embedding ? 15

  16. Non-parametric methods Key Idea : Use non-parametric learner. c n = 1 X ( y = n ) f θ ( x ) K ( x,y ) ∈ D tr i exp( − d ( f θ ( x ) , c n )) p θ ( y = n | x ) = P n 0 exp( d ( f θ ( x ) , c n 0 )) d: Euclidean, or cosine distance Snell et al. Prototypical Networks, NeurIPS ‘17 16

  17. Non-parametric methods So far : Siamese networks, matching networks, prototypical networks Embed, then nearest neighbors. Challenge What if you need to reason about more complex rela9onships between datapoints? Idea : Learn non-linear rela9on Idea : Learn infinite Idea : Perform message module on embeddings mixture of prototypes. passing on embeddings (learn d in PN) Sung et al. Rela9on Net Allen et al. IMP, ICML ‘19 Garcia & Bruna, GNN 17

  18. Case Study Machine Learning for Healthcare Conference 2019 NeurIPS 2018 ML4H Workshop Link: h^ps://arxiv.org/abs/1811.03066

  19. Problem: Few-Shot Learning for Dermatological Disease Diagnosis Dermnet dataset (h^p://www.dermnet.com/) - hard to get data Challenges : - data is long-tailed - significant intra-class variability A cquire accurate Goal : classifier on all classes (Top 200 classes only!)

  20. Prototypical Clustering Networks for Few-Shot Classifica3on Approach: Prototypical Networks + Problem formula0on : - learn mul3ple prototypes per class (to different image classes = different diseases handle intra-class variability) 150 base classes (classes w/ most data) - incorporate unlabeled support examples via 50 novel classes k-means on learned embedding Test on all 200 classes. Note : Unlike black-box & op9miza9on-based meta-learning, ProtoNets can train for N way classifica9on and test for > N way classifica9on (Side note if you read the paper: They flipped the standard nota3on of K and N in the paper)

  21. Evalua9on Compare : PN - standard ProtoNets, trained on 150 base classes, pre-trained on ImageNet FT N -*NN - ImageNet pre-training, fine-tuned ResNet on N classes, *-nearest neighbors in resul9ng embedding space FT 200 -*CE - ImageNet pre-trained, fine-tuned on all 200 classes with balancing (very strong baseline, accesses more info during training, requires re-training for new classes) Evalua0on Metric : mean class accuracy (mca), i.e. average of per-class accuracies across 200 classes. k = 5 k = 10 PCN > PN PCN > FT N -*NN PCN ≈ FT 200 -*CE without requiring re-training More visualiza9ons and analysis in the paper!

  22. Plan for Today Non-Parametric Few-Shot Learning - Siamese networks, matching networks, prototypical networks - Case study of few-shot medical image diagnosis Properties of Meta-Learning Algorithms - Comparison of approaches Example Meta-Learning Applications - Imitation learning, drug discovery, motion prediction, language generation How can we think about how these methods compare? 22

  23. Black-box vs. Op9miza9on vs. Non-Parametric Computa(on graph perspec0ve Black-box Op0miza0on-based Non-parametric y ts = f θ ( D tr i , x ts ) � f θ ( x ts ) , c n � = softmax( − d ) y ts where c n = 1 X ( y = n ) f θ ( x ) K ( x,y ) ∈ D tr x ts i Note: (again) Can mix & match components of computa9on graph Gradient descent on rela9on net embedding. Both condi9on on data & MAML, but ini9alize last layer as run gradient descent. ProtoNet during meta-training Jiang et al. CAML ‘19 Triantafillou et al. Proto-MAML ‘19 Rusu et al. LEO ‘19 23

  24. Black-box vs. Op9miza9on vs. Non-Parametric Algorithmic proper(es perspec0ve the ability for f to represent a range of learning procedures Expressive power Why? scalability, applicability to a range of domains learned learning procedure will monotonically improve with more data Consistency reduce reliance on meta-training tasks, Why? good OOD task performance Recall: These proper9es are important for most applica9ons! 24

  25. Black-box vs. Op9miza9on vs. Non-Parametric Black-box Op9miza9on-based Non-parametric + complete expressive power + consistent , reduces to GD + expressive for most architectures ~ consistent under certain - not consistent ~ expressive for very deep models * condi0ons + easy to combine with variety of + posi0ve induc0ve bias at the start + en9rely feedforward of meta-learning learning problems (e.g. SL, RL) + computa0onally fast & easy to + handles varying & large K well op0mize - challenging op0miza0on (no + model-agnos0c - harder to generalize to varying K induc9ve bias at the ini9aliza9on) - second-order op0miza0on - ojen data-inefficient - hard to scale to very large K - usually compute and memory - so far, limited to classifica0on intensive Generally, well-tuned versions of each perform comparably on exis9ng few-shot benchmarks! (likely says more about the benchmarks than the methods) Which method to use depends on your use-case . *for supervised learning sekngs 25

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend