Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz - - PowerPoint PPT Presentation

laplacian regularized few shot learning laplacianshot
SMART_READER_LITE
LIVE PREVIEW

Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz - - PowerPoint PPT Presentation

Laplacian Regularized Few Shot Learning (LaplacianShot) Imtiaz Masud Ziko, Jose Dolz, Eric Granger and Ismail Ben Ayed ETS Montreal 1 Overview Few-Shot Proposed Experiments Learning LaplacianShot - Experimental Setup - What and Why ?


slide-1
SLIDE 1

Laplacian Regularized Few Shot Learning (LaplacianShot)

Imtiaz Masud Ziko, Jose Dolz, Eric Granger and Ismail Ben Ayed

1

ETS Montreal

slide-2
SLIDE 2

Overview

2

Few-Shot Learning

  • What and Why ?
  • Brief discussion
  • n existing

approaches.

Proposed LaplacianShot

  • The context
  • Proposed formulation
  • Optimization
  • Proposed Algorithm

Experiments

  • Experimental Setup
  • SOTA results on

5 different few-shot benchmarks.

slide-3
SLIDE 3

Few-Shot Learning (An example)

3

slide-4
SLIDE 4

Few-Shot Learning (An example)

4

  • Given C = 5 classes
  • Each class c having 1

examples. (5-way 1-shot) Learn a Model

From these

To classify this

slide-5
SLIDE 5

Few-Shot Learning (An example)

5

From these

2 4

  • Given C = 5 classes
  • Each class c having 1

examples. (5-way 1-shot)

Learn a Model

To classify this

slide-6
SLIDE 6

Few-Shot Learning

6

Humans recognize perfectly with few examples

2 4

slide-7
SLIDE 7

Few-Shot Learning

7

❏ Modern ML methods generalize poorly ❏ Need a better way.

slide-8
SLIDE 8

Few-shot learning

8

A very large body of recent works, mostly based on: Meta-learning framework

slide-9
SLIDE 9

Meta-Learning Framework

9

slide-10
SLIDE 10

Meta-Learning Framework

10

Training set with enough labeled data (base classes different from the test classes)

slide-11
SLIDE 11

Meta-Learning Framework

11

Training set with enough labeled data to learn initial model

slide-12
SLIDE 12

Meta-Learning Framework

12

Create episodes and do episodic training to learn meta-learner

Vinyal et al, (Neurips ‘16), Snell et al, (Neurips ‘17), Sung et al, (CVPR ‘ 18), Finn et al, (ICML‘ 17), Ravi et al, (ICLR‘ 17), Lee et al, (CVPR‘ 19), Hu et al, (ICLR ‘20), Ye et al, (CVPR ‘20), . . .

slide-13
SLIDE 13

Taking a few steps backward . .

13

Recently [Chen et al., ICLR’19, Wang et al., ’19, Dhillon et al., ICLR’20] :

Simple baselines outperform the overly convoluted meta-learning based approaches.

slide-14
SLIDE 14

Baseline Framework

14

No need to meta-train

slide-15
SLIDE 15

Baseline Framework

15

Simple conventional cross-entropy training The approaches mostly differ during inference

slide-16
SLIDE 16

Inductive vs Transductive inference

16

Query/Test point

Examples

Vinayls et al., NEURIPS’ 16 (Attention mechanism) Snell et al., NEURIPS’ 17 (Nearest Prototype) Supports

slide-17
SLIDE 17

Inductive vs Transductive inference

17

Transductive: Predict for all test points, instead of one at a time

Query/Test points

Examples

Liu et. al., ICLR’19 (Label propagation) Dhillon, ICLR’20 (Transductive fine-tuning) Supports

slide-18
SLIDE 18

Proposed LaplacianShot

18

  • Latent Assignment matrix for N

query samples:

  • Label assignment for each

query:

  • And Simplex Constraints:

Laplacian-regularized

  • bjective:
slide-19
SLIDE 19

Proposed LaplacianShot

19

Laplacian-regularized

  • bjective:

Nearest Prototype classification When Similar to ProtoNet (Snell ’17) or SimpleShot (Wang ’19) Laplacian Regularization Well known in Graph Laplacian: Spectral clustering (Shi

I‘00, Von ‘07) , SLK

(Ziko ’18) SSL (Weston ‘12, Belkin

‘06)

slide-20
SLIDE 20

LaplacianShot Takeaways

20

✓ SOTA results without bell and whistles. ✓ Simple constrained graph clustering works very well. ✓ No network fine-tuning, neither meta-learning ✓ Fast transductive inference: almost inductive time ✓ Model Agnostic

slide-21
SLIDE 21

21

LapLacianShot More Details

slide-22
SLIDE 22

Proposed LaplacianShot

22

Laplacian-regularized

  • bjective:

Nearest Prototype classification

  • Feature embedding:
  • Prototype can be :
  • The support example in 1-shot or
  • Simple mean from support examples or
  • Weighted mean from both support and

initially predicted query samples When Labeling according to nearest support prototypes

slide-23
SLIDE 23

Proposed LaplacianShot

23

Laplacian-regularized

  • bjective:

Laplacian Regularization Well known in Graph Laplacian: Encourages nearby points to have similar assignments Pairwise similarity

slide-24
SLIDE 24

Proposed Optimization

24

Laplacian-regularized

  • bjective:

Tricky to optimize due to:

slide-25
SLIDE 25

Proposed Optimization

25

Laplacian-regularized

  • bjective:

Tricky to optimize due to: ✖ Simplex/Integer Constraints.

slide-26
SLIDE 26

Proposed Optimization

26

Laplacian-regularized

  • bjective:

Tricky to optimize due to: ✖ Laplacian over discrete variables.

slide-27
SLIDE 27

Proposed Optimization

27

Laplacian-regularized

  • bjective:

Relax integer constraints: ➢ Convex quadratic problem

✖ Require solving for the N×C variables all together ✖ Extra projection steps for the simplex constraints

slide-28
SLIDE 28

Proposed Optimization

28

Laplacian-regularized

  • bjective:

We do: ✓ Concave relaxation

✓ Independent and closed-form updates for each assignment variable ✓ Efficient bound optimization

slide-29
SLIDE 29

Concave Laplacian

29

slide-30
SLIDE 30

Concave Laplacian

30

= = When Equal

slide-31
SLIDE 31

Concave Laplacian

31

= When

Not Equal

slide-32
SLIDE 32

Concave Laplacian

32

= When

Not Equal

Degree

฀฀ ฀฀

slide-33
SLIDE 33

Concave Laplacian

33

=

Remove constant terms

slide-34
SLIDE 34

Concave Laplacian

34

Concave for PSD matrix

slide-35
SLIDE 35

Concave-Convex relaxation

35

Convex barrier function:

  • Avoids extra dual variables for
  • Closed- form update for the simplex constraint duel

Putting it altogether

slide-36
SLIDE 36

Bound optimization

36

First-order approximation

  • f concave term

Fixed unary

slide-37
SLIDE 37

Where:

Bound optimization

37

We get Iterative tight upper bound:

Iteratively optimize:

slide-38
SLIDE 38

Bound optimization

38

Independent upper bound:

slide-39
SLIDE 39

KKT conditions brings closed form updates:

Bound optimization

39

Minimize Independent upper bound:

slide-40
SLIDE 40

LaplacianShot Algorithm

40

slide-41
SLIDE 41

Experiments

41

Datasets:

1. Mini-ImageNet 2. Tierd-ImageNet 3. CUB 200-2001 4. Inat

Generic Classification miniImageNet splits: 64 base, 16 validation and 20 test classes tieredImageNet splits: 351 base, 97 validation and 160 test classes Fine-Grained Classification Splits: 100 base, 50 validation and 50 test classes

slide-42
SLIDE 42

Experiments

42

Datasets:

1. Mini-ImageNet 2. Tierd-ImageNet 3. CUB 200-2001 4. Inat

Evaluation protocol:

  • 5-way 1-shot/5-shot .
  • 15 query samples per class

(N=75).

  • Average accuracy over 10,000

few-shot tasks with 95% confidence interval.

slide-43
SLIDE 43

Experiments

43

Datasets:

1. Mini-ImageNet 2. Tierd-ImageNet 3. CUB 200-2001 4. Inat

  • More realistic and challenging
  • Recently introduced (Wertheimer&

Hariharan, 2019)

  • Slight class distinction
  • Imbalanced class distribution with

variable number of supports/query per class

slide-44
SLIDE 44

Experiments

44

Datasets:

1. Mini-ImageNet 2. Tierd-ImageNet 3. CUB 200-2001 4. Inat

Evaluation protocol:

  • 227-way multi-shot .
  • Top-1 accuracy averaged over

the test images Per Class.

  • Top-1 accuracy averaged over all

the test images (Mean)

slide-45
SLIDE 45

Experiments

45

We do Cross-entropy training with base classes LaplacianShot during inference

slide-46
SLIDE 46

Results (Mini-ImageNet)

46

slide-47
SLIDE 47

Results (Mini-ImageNet)

47

slide-48
SLIDE 48

Results (Tiered-ImageNet)

48

slide-49
SLIDE 49

Results (CUB)

49

Cross Domain

slide-50
SLIDE 50

Results (iNat)

50

slide-51
SLIDE 51

Ablation: Choosing

51

slide-52
SLIDE 52

Ablation: Convergence

52

slide-53
SLIDE 53

Ablation: Average Inference time

53

Transductive

slide-54
SLIDE 54

LaplacianShot Takeaways

54

✓ SOTA results without bell and whistles. ✓ Simple constrained graph clustering works very well. ✓ No network fine-tuning, neither meta-learning ✓ Fast transductive inference: almost inductive time ✓ Model Agnostic: during inference with any training model

and gain up to 4/5%!!!

slide-55
SLIDE 55

Thank you

Code On:

https://github.com/imtiazziko/LaplacianShot

55