LEEP: A New Measure to Evaluate Transferability of Learned - - PowerPoint PPT Presentation

leep a new measure to evaluate transferability of learned
SMART_READER_LITE
LIVE PREVIEW

LEEP: A New Measure to Evaluate Transferability of Learned - - PowerPoint PPT Presentation

LEEP: A New Measure to Evaluate Transferability of Learned Representations Cuong V. Nguyen Tal Hassner Amazon Web Services Facebook AI Matthias Seeger Cedric Archambeau Amazon Web Services Amazon Web Services Work done prior to


slide-1
SLIDE 1

∗Work done prior to joining Facebook AI

Correspondence to: nguycuo@amazon.com 1/14

LEEP: A New Measure to Evaluate Transferability of Learned Representations

Cuong V. Nguyen

Amazon Web Services

Tal Hassner∗

Facebook AI

Matthias Seeger

Amazon Web Services

Cedric Archambeau

Amazon Web Services

slide-2
SLIDE 2

Correspondence to: nguycuo@amazon.com 2/14

Problem

Transferability estimation

Estimating how easy it is to transfer knowledge from one classification task to another ◮ Given a pre-trained source model and a target data set ◮ Develop a measure (a score) for how effectively transfer learning can transfer from the source model to the target data ◮ Transferability measure should be easy and cheap to compute → ideally without training

slide-3
SLIDE 3

Correspondence to: nguycuo@amazon.com 3/14

Why do we need transferability estimation?

◮ Help understand the relationships/structures between tasks ◮ Select groups of highly transferable tasks for joint training ◮ Select good source models for transfer learning

◮ Potentially reduce training data size and training time

slide-4
SLIDE 4

Correspondence to: nguycuo@amazon.com 4/14

Our contributions

◮ We develop a novel transferability measure, Log Expected Empirical Prediction (LEEP), for deep networks ◮ Properties of LEEP:

◮ Very simple ◮ Clear interpretation: average log-likelihood of the expected empirical predictor ◮ Easy to compute: no training needed, only requires one forward pass through target data set ◮ Can be applied to most modern deep networks

slide-5
SLIDE 5

Correspondence to: nguycuo@amazon.com 5/14

Log Expected Empirical Prediction (LEEP) (1)

◮ Assume source model θ and target data set D = {(x1, y1), . . . , (xn, yn)} ◮ We compute LEEP score between θ and D in 3 steps.

  • 1. Apply θ to each input xi to get dummy label distribution

θ(xi).

◮ θ(xi) is a distribution on source label set Z ◮ Labels in Z may not semantically relate to true label yi of xi e.g., Z is ImageNet labels but (xi, yi) is from CIFAR

  • 2. Compute empirical conditional distribution of target label y

given dummy source label z Empirical joint dist: ˆ P(y, z) =

i:yi=y θ(xi)z/n

Empirical marginal dist: ˆ P(z) =

y ˆ

P(y, z) Empirical conditional dist: ˆ P(y|z) = ˆ P(y, z)/ ˆ P(z)

slide-6
SLIDE 6

Correspondence to: nguycuo@amazon.com 6/14

Log Expected Empirical Prediction (LEEP) (2)

Expected Empirical Predictor (EEP)

A classifier that predicts the label y of an input x as follows: ◮ First, randomly drawing a dummy label z from θ(x) ◮ Then, randomly drawing y from ˆ P(y|z) Equivalently, y ∼

z ˆ

P(y|z) θ(x)z

  • 3. LEEP is the average log-likelihood of EEP given data D:

T(θ, D) = 1 n

  • i

log

  • z

ˆ P(yi|z) θ(xi)z

slide-7
SLIDE 7

Correspondence to: nguycuo@amazon.com 7/14

Experiment: overview

◮ Aim: show that LEEP can predict actual transfer accuracy ◮ Procedure:

◮ Consider many random transfer learning tasks ◮ Compute LEEP scores for these tasks ◮ Compute actual test accuracy of transfer learning methods on these tasks ◮ Evaluate correlations between LEEP scores and the test accuracies

◮ Transfer methods:

◮ Retrain head: only retrain last fully connected layer using target set ◮ Fine-tune: replace the head classifier and fine-tune all model parameters with SGD

slide-8
SLIDE 8

Correspondence to: nguycuo@amazon.com 8/14

Experiment: LEEP vs. Transfer Accuracy

◮ Compare LEEP score with test accuracy of transferred models

  • n 200 random target tasks

◮ Result: LEEP scores highly correlated with actual test accuracies (correlation coefficients > 0.94)

4 3 2 1 LEEP score 0.0 0.2 0.4 0.6 0.8 1.0 Test accuracy fine-tune retrain head 4 3 2 1 LEEP score 0.0 0.2 0.4 0.6 0.8 1.0 Test accuracy fine-tune retrain head

ImageNet → CIFAR100 CIFAR10 → CIFAR100 (ResNet18) (ResNet20)

slide-9
SLIDE 9

Correspondence to: nguycuo@amazon.com 9/14

Experiment: LEEP with Small Data

◮ Restrict target data sets to 5 random classes and 50 examples per class ◮ Partitioning LEEP scores’ range into 5 transferability levels and averaging test accuracies of tasks within each level ◮ Result: higher transferability level according to LEEP → easier to transfer ◮ Similar results when target data sets are imbalanced.

1 2 3 4 5 Transferability level 0.2 0.4 0.6 0.8 Average test accuracy fine-tune retrain head

slide-10
SLIDE 10

Correspondence to: nguycuo@amazon.com 10/14

Experiment: LEEP vs. Meta-Transfer Accuracy

◮ Compare LEEP score with test accuracy of Conditional Neural Adaptive Processes (CNAPs) (Requeima et al., 2019) ◮ CNAPs was trained using the Meta-dataset (Triantafillou et al., 2020) ◮ Target tasks are drawn from CIFAR100 ◮ Result: higher transferability level according to LEEP → easier to meta-transfer

1 2 3 4 5 Transferability level 0.6 0.7 0.8 0.9 Average test accuracy

slide-11
SLIDE 11

Correspondence to: nguycuo@amazon.com 11/14

Experiment: LEEP vs. Convergence of Fine-tuned Models

◮ Compare convergence speed to a reference model ◮ Reference model: trained from scratch using only the target data set ◮ Result: higher transferability level according to LEEP → better convergence

1 5 10 15 # epoch 0.3 0.2 0.1 0.0 Accuracy difference

level 1 level 2 level 3 level 4 level 5

1 5 10 15 # epoch 0.2 0.0 0.2 Accuracy difference

level 1 level 2 level 3 level 4 level 5

ImageNet → CIFAR100 CIFAR10 → CIFAR100 (ResNet18) (ResNet20)

slide-12
SLIDE 12

Correspondence to: nguycuo@amazon.com 12/14

Experiment: LEEP for Source Model Selection

◮ Select from 9 candidate models and transfer to CIFAR100 ◮ Compare with:

◮ Negative Conditional Entropy (NCE) (Tran et al., 2019) ◮ H score (Bao et al., 2019) ◮ ImageNet top-1 accuracy (Kornblith et al., 2019)

◮ Result: LEEP can predict test accuracies better

4.5 4.0 3.5 LEEP score 0.0 0.1 0.2 0.3 Test accuracy ResNet18 ResNet34 ResNet50 MobileNet1.0 MobileNet0.75 MobileNet0.5 MobileNet0.25 DarkNet53 SENet154 4.0 3.7 3.4 NCE score 4 11 18 H score 0.5 0.6 0.7 0.8 ImageNet accuracy

slide-13
SLIDE 13

Correspondence to: nguycuo@amazon.com 13/14

Discussion

◮ Model selection results are very sensitive to the architecture and the size of the source networks. → May need to calibrate/normalize the scores for better performance ◮ Potentially useful for feature selection as well. ◮ For very small data sets, re-training the head directly using 2nd-order optimization methods could also be efficient

slide-14
SLIDE 14

Correspondence to: nguycuo@amazon.com 14/14

Thank you.