cs 4803 7643 deep learning
play

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised - PowerPoint PPT Presentation

CS 4803 / 7643: Deep Learning Topics: Moving beyond supervised learning Zsolt Kira Georgia Tech Administrativia Projects! Due April 30 th Template online Can use MS Word but follow the organization/rubric! No


  1. CS 4803 / 7643: Deep Learning Topics: – Moving beyond supervised learning Zsolt Kira Georgia Tech

  2. Administrativia • Projects! – Due April 30 th – Template online – Can use MS Word but follow the organization/rubric! • No posters/presentations (C) Zsolt Kira 2

  3. Project Note • Important note: – Your project should include doing something beyond just downloading open-source code, fine-tuning, and showing the result – This can include: • implementation of additional approaches (even if leveraging open- source code), • a thorough analysis/investigation of some phenomena or hypothesis • theoretical analysis, or • When using external resources, provide references to anything you used in the write-up! (C) Zsolt Kira 3

  4. Supervised Learning Supervised Learning ● ML has been focused largely on this ● Lots of other problem settings are now coming up: What if we have unlabeled data? ○ What if we have many datasets? ○ 4 What if we only have one example per (new) class? ○

  5. But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation Setting Source Target Shift Type • Meta-Learning Semi-supervised Single Single None • Zero-shot learning labeled unlabeled • Continual / Lifelong-learning Domain Single Single Non- Adaptation labeled unlabeled semantic • Multi-modal learning Domain Multiple Unknown Non- • Multi-task learning Generalization labeled semantic Cross-Task Single Single Semantic • Active learning Transfer labeled unlabeled • … Few-Shot Single Single few- Semantic Learning labeled labeled Un/Self- Single Many labeled Both/Task Supervised unlabeled (C) Zsolt Kira 5

  6. An Entire Class on this! • Deep Unsupervised Learning class (UC Berkeley) • Link: – https://sites.google.com/view/berkeley-cs294-158-sp20/home (C) Zsolt Kira 6

  7. But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation • Meta-Learning • Zero-shot learning • Continual / Lifelong-learning • Multi-modal learning • Multi-task learning • Active learning • … (C) Zsolt Kira 7

  8. What is Semi-Supervised Learning? Supervised Learning Semi-Supervised Learning 8 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

  9. What is Semi-Supervised Learning? Supervised Learning Semi-Supervised Learning 9 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

  10. Semi-Supervised Learning ● Classification: Fully Supervised ○ Training data: (image, label), predict label for new images. ● What if we have a few labeled samples and many unlabeled samples? Labeling is generally time-consuming and expensive in certain domains. ● Semi-Supervised Learning ○ Training data: Labeled data (image, label) and Unlabeled data (image) ○ Goal: Use the unlabeled data to make supervised learning better ○ Note: If we have lots of labeled data, this goal is much harder 10 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

  11. Why Semi-Supervised Learning? Slide: Thang Luong ● My take: Reality might be in-between: 11 ● Might be able to improve upon high-labeled data regime but with exponentially increasing unlabeled data (of the proper type) ● See Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

  12. Agenda ■ Core concepts Confidence vs Entropy ■ Pseudo Labeling ■ Entropy minimization ■ Virtual Adversarial Training ■ Label Consistency ■ Make sure augmentations of the sample have the same class ■ Pi-Model, Temporal Ensembling, Mean Teacher ■ Regularization ■ Weight decay ■ Dropout ■ Data-Augmentation (MixUp, CutOut) ■ Unsupervised Data Augmentation (UDA), MixMatch ■ Co-Training / Self-Training / Pseudo Labeling (Noisy Student) ■ 12 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

  13. Pseudo Labeling ● Simple idea: ○ Train on labeled data ○ Make predictions on unlabeled data ○ Add confident predictions to training data ○ Can do these both end-to-end (no need to separate stages) 13 Slide Credit: Pieter Abbeel et al., CS294-158, UC Berkeley

  14. Issue: Confidences on New Data ● Predictions on unlabeled data may be too flat (high entropy) ● Solution: Entropy minimization ● Several ways to achieve this ○ Explicit loss ○ Sharpening function (e.g. temperature scaling) 14 Image Credit: Figure modified from MixMatch paper

  15. Label Consistency with Data Augmentation 15

  16. Label Consistency with Data Augmentation Could be Unlabeled or Labeled 16

  17. Label Consistency with Data Augmentation 17

  18. Label Consistency with Data Augmentation Make sure that the logits are similar 18

  19. More Data Augmentation -> Regularization 19

  20. Realistic Evaluation of Semi-Supervised Learning 20

  21. Outline ■ Realistic Evaluation of Semi-Supervised Learning pi-model ■ Temporal Ensembling ■ Mean Teacher ■ Virtual Adversarial Training ■ 21

  22. pi-Model Temporal Ensembling for Semi-Supervised Learning 22

  23. pi-Model Temporal Ensembling for Semi-Supervised Learning 23

  24. Comparison 24

  25. Comparison 25

  26. Varying number of labels 26

  27. Class Distribution Mismatch 27

  28. MixMatch 28

  29. MixMatch 29

  30. MixMatch MixUp 30

  31. MixMatch 31

  32. MixMatch 32

  33. MixMatch 33

  34. MixMatch 34

  35. FixMatch 35

  36. FixMatch - Results 36

  37. But wait, there’s more! • Transfer Learning • Semi-supervised learning • One/Few-shot learning • Un/Self-Supervised Learning • Domain adaptation • Meta-Learning • Zero-shot learning • Continual / Lifelong-learning • Multi-modal learning • Multi-task learning • Active learning • … (C) Zsolt Kira 37

  38. Few-Shot Learning (C) Zsolt Kira 38 Slide Credit: Hugo Larochelle

  39. Few-Shot Learning (C) Zsolt Kira 39 Slide Credit: Hugo Larochelle

  40. Normal Approach • Do what we always do: Fine-tuning – Train classifier on base classes – Freeze features – Learn classifier weights for new classes using few amounts of labeled data (during “ query” time!) A Closer Look at Few-shot Classification, Wei-Yu Chen, Yen-Cheng Liu, (C) Zsolt Kira 40 Zsolt Kira, Yu-Chiang Frank Wang, Jia-Bin Huang

  41. Cons of Normal Approach • The training we do on the base classes does not factor the task into account • No notion that we will be performing a bunch of N- way tests • Idea: simulate what we will see during test time (C) Zsolt Kira 41

  42. Meta-Training Approach • Set up a set of smaller tasks during training which simulates what we will be doing during testing https://www.borealisai.com/en/blog/tutorial-2-few-shot-learning-and-meta-learning-i/ – Can optionally pre-train features on held-out base classes (not typical) • Testing stage is now the same, but with new classes (C) Zsolt Kira 42

  43. Meta-Learning Approaches • Learning a model conditioned on support set (C) Zsolt Kira 43

  44. More Sophisticated Meta-Learning Approaches • Learn gradient descent: – Parameter initialization and update rules – Output: • Parameter initialization • Meta-learner that decides how to update parameters • Learn just an initialization and use normal gradient descent (MAML) – Output: • Just parameter initialization! • We are using SGD (C) Dhruv Batra & Zsolt Kira 44

  45. Meta-Learner • How to parametrize learning algorithms? • Two approaches to defining a meta-learner – Take inspiration from a known learning algorithm • kNN/kernel machine: Matching networks (Vinyals et al. 2016) • Gaussian classifier: Prototypical Networks (Snell et al. 2017) • Gradient Descent: Meta-Learner LSTM (Ravi & Larochelle, 2017) , MAML (Finn et al. 2017) – Derive it from a black box neural network • MANN (Santoro et al. 2016) • SNAIL (Mishra et al. 2018) (C) Zsolt Kira 45 Slide Credit: Hugo Larochelle

  46. More Sophisticated Meta-Learning Approaches • Learn gradient descent: – Parameter initialization and update rules – Output: • Parameter initialization • Meta-learner that decides how to update parameters • Learn just an initialization and use normal gradient descent (MAML) – Output: • Just parameter initialization! • We are using SGD (C) Zsolt Kira 46

  47. Meta-Learner LSTM (C) Zsolt Kira 47 Slide Credit: Hugo Larochelle

  48. Meta-Learner LSTM (C) Zsolt Kira 48 Slide Credit: Hugo Larochelle

  49. Meta-Learner LSTM (C) Zsolt Kira 49 Slide Credit: Hugo Larochelle

  50. Meta-Learner LSTM (C) Zsolt Kira 50 Slide Credit: Hugo Larochelle

  51. Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 53 Slide Credit: Hugo Larochelle

  52. Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 55 Slide Credit: Sergey Levine

  53. Model-Agnostic Meta-Learning (MAML) (C) Zsolt Kira 56 Slide Credit: Sergey Levine

  54. Comparison (C) Zsolt Kira 57 Slide Credit: Sergey Levine

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend