Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 - - PowerPoint PPT Presentation

β–Ά
semi supervised learning
SMART_READER_LITE
LIVE PREVIEW

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 - - PowerPoint PPT Presentation

Semi-Supervised Learning Jia-Bin Huang Virginia Tech Spring 2019 ECE-5424G / CS-5824 Administrative Advanced Topics Semi-supervised learning Ensemble learning Generative models Sequence prediction models Deep reinforcement


slide-1
SLIDE 1

Semi-Supervised Learning

Jia-Bin Huang Virginia Tech

Spring 2019

ECE-5424G / CS-5824

slide-2
SLIDE 2

Administrative

slide-3
SLIDE 3

Advanced Topics

  • Semi-supervised learning
  • Ensemble learning
  • Generative models
  • Sequence prediction models
  • Deep reinforcement learning
slide-4
SLIDE 4

Semi-supervised Learning Problem Formulation

  • Labeled data

π‘‡π‘š = 𝑦 1 , 𝑧 1 , 𝑦 2 , 𝑧 2 , β‹― , 𝑦 π‘›π‘š , 𝑧 π‘›π‘š

  • Unlabeled data

𝑇𝑣 = 𝑦 1 , 𝑦 2 , β‹― , 𝑦 𝑛𝑣

  • Goal: Learn a hypothesis β„Žπœ„ (e.g., a classifier) that has small error
slide-5
SLIDE 5

Deep Semi-supervised Learning

slide-6
SLIDE 6

Semi-supervised Learning

  • Motivation
  • Problem formulation
  • Consistency regularization
  • Entropy-based method
slide-7
SLIDE 7

Stochastic Perturbations/Ξ -Model

  • Realistic perturbations 𝑦 β†’ ො

𝑦 of data points 𝑦 ∈ 𝐸𝑉𝑀 should not significantly change the output of hπœ„(𝑦)

slide-8
SLIDE 8

Temporal Ensembling

slide-9
SLIDE 9

Mean Teacher

slide-10
SLIDE 10

Virtual Adversarial Training

slide-11
SLIDE 11

Semi-supervised Learning

  • Motivation
  • Problem formulation
  • Consistency regularization
  • Entropy-based method
slide-12
SLIDE 12

Entropy minimization

  • Encourages more confident predictions on unlabeled data.
  • EntMin
  • Pseudo-labeling
  • Add confidently predicted samples into the training set
slide-13
SLIDE 13

Comparison

slide-14
SLIDE 14

Varying number of labels

slide-15
SLIDE 15

Class mismatch in Labeled/Unlabeled datasets hurts the performance

slide-16
SLIDE 16

Lessons

  • Standardized architecture + equal budget for tuning hyperparameters
  • Unlabeled data from a different class distribution not that useful
  • Most methods don’t work well in the very low labeled-data regime
  • Transferring Pre-Trained Imagenet produces lower error rate
  • Conclusions based on small datasets though
slide-17
SLIDE 17

Ensemble methods

  • Bagging
  • Gradient boosting
  • AdaBoosting

Following slides are from Alex Ihler

slide-18
SLIDE 18