Adapted Deep Embeddings: A Synthesis of Methods for ! -Shot - - PowerPoint PPT Presentation

adapted deep embeddings a synthesis of methods for shot
SMART_READER_LITE
LIVE PREVIEW

Adapted Deep Embeddings: A Synthesis of Methods for ! -Shot - - PowerPoint PPT Presentation

Adapted Deep Embeddings: A Synthesis of Methods for ! -Shot Inductive Transfer Learning Tyler R. Scott 1,2 , Karl Ridgeway 1,2 , Michael C. Mozer 1,3 1 University of Colorado, Boulder 2 Sensory Inc. 3 Presently at Google Brain Inductive Transfer


slide-1
SLIDE 1

Adapted Deep Embeddings: A Synthesis of Methods for !-Shot Inductive Transfer Learning

Tyler R. Scott1,2, Karl Ridgeway1,2, Michael C. Mozer1,3

1 University of Colorado, Boulder 2 Sensory Inc. 3 Presently at Google Brain

slide-2
SLIDE 2

Inductive Transfer Learning

Model Target Domain Input Target Domain Prediction Source Domain Data Target Domain Data

slide-3
SLIDE 3

Weight Transfer

Retrain output Adapt weights to target domain

Inductive Transfer Learning

Yosinski et al. (2014)

Source Domain

Target Domain

slide-4
SLIDE 4

Weight Transfer Deep Metric Learning

Inductive Transfer Learning

Source Domain

Source & Target Domain Embedding

Histogram loss

(Ustinova & Lempitsky, 2016)

Distance Within class Between class

slide-5
SLIDE 5

Weight Transfer Deep Metric Learning Few-Shot Learning

Inductive Transfer Learning

Source Domain

Source & Target Domain Embedding

Prototypical nets

(Snell et al., 2017)

slide-6
SLIDE 6

Weight Transfer Deep Metric Learning Few-Shot Learning

Inductive Transfer Learning

Adapted Deep Embeddings

  • 1. Train network using

embedding loss

  • Histogram loss,

Prototypical nets

  • 2. Adapt weights using

limited target-domain data

Source Domain

Target Domain

slide-7
SLIDE 7

Why hasn’t a comparison been explored?

Inductive Transfer Learning

# labeled examples per target class (k) Weight Transfer > 100 Deep Metric Learning agnostic Few-Shot Learning < 20

slide-8
SLIDE 8

Target Domain k labeled examples per class Source Domain 2200 labeled examples per class

MNIST

slide-9
SLIDE 9

1 5 10 50 100 500 1000

k

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Test Accuracy MNIST, n = 5

Baseline

Target Domain Test Accuracy Labeled Examples in Target Domain (k) MNIST

slide-10
SLIDE 10

1 5 10 50 100 500 1000

k

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Test Accuracy MNIST, n = 5

Weight Adaptation Baseline

Target Domain Test Accuracy Labeled Examples in Target Domain (k) MNIST

slide-11
SLIDE 11

1 5 10 50 100 500 1000

k

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Test Accuracy MNIST, n = 5

Prototypical Net Weight Adaptation Baseline

Target Domain Test Accuracy Labeled Examples in Target Domain (k) MNIST

slide-12
SLIDE 12

1 5 10 50 100 500 1000

k

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Test Accuracy MNIST, n = 5

Histogram Loss Prototypical Net Weight Adaptation Baseline

Target Domain Test Accuracy Labeled Examples in Target Domain (k) MNIST

slide-13
SLIDE 13

1 5 10 50 100 500 1000

k

0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Test Accuracy MNIST, n = 5

Adapted Histogram Loss Adapted Prototypical Net Histogram Loss Prototypical Net Weight Adaptation Baseline

Target Domain Test Accuracy Labeled Examples in Target Domain (k) MNIST

slide-14
SLIDE 14

1 10 50 100 200

k

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Isolet, n = 5

5 10 100 1000

n

0.0 0.1 0.2 0.3 0.4 0.5 0.6

Test Accuracy

Omniglot, k = 1

5 10 100 1000

n

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Omniglot, k = 5

5 10 100 1000

n

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

Omniglot, k = 10

1 10 50 100 300

k

0.2 0.3 0.4 0.5 0.6 0.7

Test Accuracy

tinyImageNet, n = 5

1 10 50 100 300

k

0.1 0.2 0.3 0.4 0.5 0.6

tinyImageNet, n = 10

1 10 50 100 300

k

0.0 0.1 0.2

tinyImageNet, n = 50 200 1 10 50 100 200

k

0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Isolet, n = 10

Adapted Histogram Loss Adapted Prototypical Net Histogram Loss Prototypical Net Weight Adaptation Baseline

slide-15
SLIDE 15

Conclusion

  • Weight transfer is the least effective method for inductive transfer

learning

  • Histogram loss is robust regardless of the amount of labeled data in

the target domain

  • Adapted embeddings outperform every static embedding method

previously proposed

slide-16
SLIDE 16

Poster #167 Room 210 & 230 AB Today, 5:00 - 7:00 PM