TagProp: Discriminative Metric Learning in Nearest Neighbour Models - - PowerPoint PPT Presentation

tagprop discriminative metric learning in nearest
SMART_READER_LITE
LIVE PREVIEW

TagProp: Discriminative Metric Learning in Nearest Neighbour Models - - PowerPoint PPT Presentation

Introduction Metric Learning Data Sets and Evaluation Results Conclusion TagProp: Discriminative Metric Learning in Nearest Neighbour Models for Image Auto-Annotation Guillaumin, Mensink, Verbeek, Schmid Daniel Rios-Pavia, Thomas


slide-1
SLIDE 1

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

TagProp: Discriminative Metric Learning in Nearest Neighbour Models for Image Auto-Annotation

Guillaumin, Mensink, Verbeek, Schmid Daniel Rios-Pavia, Thomas Vincent-Sweet

UJF, Ensimag

January 14, 2011

slide-2
SLIDE 2

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Layout

1

Introduction

2

Metric Learning Tag prediction Rank-based Distance-based Sigmoidal modulation

3

Data Sets and Evaluation Feature Extraction Data Sets Evaluation

4

Results

5

Conclusion

slide-3
SLIDE 3

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Layout

1

Introduction

2

Metric Learning Tag prediction Rank-based Distance-based Sigmoidal modulation

3

Data Sets and Evaluation Feature Extraction Data Sets Evaluation

4

Results

5

Conclusion

slide-4
SLIDE 4

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

TagProp: Tag Propagation

Aim: Tag images automatically through keyword relevance prediction Applications:

Image annotation Image search

slide-5
SLIDE 5

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Auto-Annotation Example

slide-6
SLIDE 6

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Layout

1

Introduction

2

Metric Learning Tag prediction Rank-based Distance-based Sigmoidal modulation

3

Data Sets and Evaluation Feature Extraction Data Sets Evaluation

4

Results

5

Conclusion

slide-7
SLIDE 7

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Predicting Tag Relevance

Propagate annotations from training images to new images Use metric learning instead of fixed metric or ad-hoc combinations

  • f metrics
slide-8
SLIDE 8

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Weighted Nearest Neighbour Tag Prediction

Tags are either absent or present (i: image w: word) yiw ∈ {−1, +1}

slide-9
SLIDE 9

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Weighted Nearest Neighbour Tag Prediction

Tags are either absent or present (i: image w: word) yiw ∈ {−1, +1} Tag presence prediction p(yiw = +1): p(yiw = +1) =

  • j

πij p(yiw = +1|j) p(yiw = +1|j) =

  • 1 − ǫ

for yjw = +1 ǫ

  • therwise

with πij the weight of training image j for predictions for image i.

slide-10
SLIDE 10

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Weighted Nearest Neighbour Tag Prediction

Tags are either absent or present (i: image w: word) yiw ∈ {−1, +1} Tag presence prediction p(yiw = +1): p(yiw = +1) =

  • j

πij p(yiw = +1|j) p(yiw = +1|j) =

  • 1 − ǫ

for yjw = +1 ǫ

  • therwise

with πij the weight of training image j for predictions for image i. πij ≥ 0 and

  • j

πij = 1

slide-11
SLIDE 11

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Weighted Nearest Neighbour Tag Prediction

Estimation of parameters that control weights πiw

Maximize the log-likelihood of predictions:

L =

  • i,w

ciw log p(yiw) where ciw is the cost taking into account presence/absence imbalance: ciw = 1 n+ if yiw = +1 n+ being the total number of positive labels. (same for n−)

slide-12
SLIDE 12

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Example

slide-13
SLIDE 13

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Example

p(yiw = +1|j) = 1 − ǫ for yjw = +1 ǫ

  • therwise
slide-14
SLIDE 14

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Rank-based Weighting

Fixed weight for the k-th neighbor: πiw = γk K neighbors → K parameters L is concave with respect to {γk}

EM-algorithm Projected gradient descent

Effective neighborhood size is set automatically

5 10 15 20 0.05 0.1 0.15 0.2 0.25 Neighbor Rank Weight

slide-15
SLIDE 15

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Distance-based Weighting

Weights given by visual distance dθ πij = exp(−dθ(i, j))

  • k exp(−dθ(i, k))

where θ are the parameters we want to optimize. Weights depend smoothly on distance

important if distance adjustment is needed during training.

Only one parameter per base distance

slide-16
SLIDE 16

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Distance-based Weighting

Choices for dθ include (not exhaustive):

A fixed distance d with a positive scale factor dw(i, j) = w Tdij with dij a vector of base distances w contains the positive coefficients of the distance combination Mahalanobis distance

As before, projected gradient algorithm to maximize log-likelihood and learn the distance combination.

slide-17
SLIDE 17

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Boosting the Recall of Rare Words

Keywords with low frequency in database have low recall

Mass of neighbors too small Systematic low relevance of keyword

Boosting needed.

slide-18
SLIDE 18

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Sigmoidal modulation

Word-specific logistic discriminant model

’dynamic range’ adjusted per word

p(yiw = +1) = σ(αwxiw + βw)

slide-19
SLIDE 19

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Sigmoidal modulation

Word-specific logistic discriminant model

’dynamic range’ adjusted per word

p(yiw = +1) = σ(αwxiw + βw) with σ(z) = 1

  • 1 + exp(−z)
  • and xiw =

j πij yiw

slide-20
SLIDE 20

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Sigmoidal modulation

Word-specific logistic discriminant model

’dynamic range’ adjusted per word

p(yiw = +1) = σ(αwxiw + βw) with σ(z) = 1

  • 1 + exp(−z)
  • and xiw =

j πij yiw

Adds 2 parameters for each word {αw, βw} Optimize through training (alternating maximization)

{αw, βw} neighbour weights πij

slide-21
SLIDE 21

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Sigmoid function

slide-22
SLIDE 22

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Layout

1

Introduction

2

Metric Learning Tag prediction Rank-based Distance-based Sigmoidal modulation

3

Data Sets and Evaluation Feature Extraction Data Sets Evaluation

4

Results

5

Conclusion

slide-23
SLIDE 23

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Feature Extraction

15 image representations:

slide-24
SLIDE 24

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Feature Extraction

15 image representations: Global GIST descriptor Global colour histograms

RGB, HSV, LAB 16 bin quantization

Bag-of-Words histograms

SIFT and Hue descriptors Dense grid and Harris-Laplacian interest points K-means quantization

slide-25
SLIDE 25

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Feature Extraction

15 image representations: Global GIST descriptor Global colour histograms

RGB, HSV, LAB 16 bin quantization

Bag-of-Words histograms

SIFT and Hue descriptors Dense grid and Harris-Laplacian interest points K-means quantization

3x1 spatial partitioning for BoW and colour histograms

slide-26
SLIDE 26

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Corel 5k

5000 images (landscape, animals...) max 5 tags per image (avg=3) Vocabulary size = 260

slide-27
SLIDE 27

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

ESP Game

20’000 images subset - 60k total (drawings, photos...) max 15 tags per image (avg=5) Vocabulary size = 268 Players annotate images in pairs

slide-28
SLIDE 28

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

IAPR TC12

20’000 images (tourist photos, sports...) max 23 tags per image (avg=6) Vocabulary size = 291 Natural language processing from descriptive text

slide-29
SLIDE 29

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Evaluation Method

Compute measures per keyword, then average Annotate images with top 5 keywords

Recall (nr. annotated/nr. in DB) Precision (nr. correctly annotated/nr.annotated) N+ (nr. words with recall > 0)

Retrieval (search)

Rank results according to query keyword presence probability Precision for nw images (nr. ground truth images with w) Mean Average Precision (mAP) and Break-Even Point (BEP)

slide-30
SLIDE 30

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Layout

1

Introduction

2

Metric Learning Tag prediction Rank-based Distance-based Sigmoidal modulation

3

Data Sets and Evaluation Feature Extraction Data Sets Evaluation

4

Results

5

Conclusion

slide-31
SLIDE 31

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Results: Annotation

Distance > Rank

slide-32
SLIDE 32

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Results: Annotation

Distance > Rank Sigmoid improves recall, loses precision

slide-33
SLIDE 33

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Results: Annotation

Distance > Rank Sigmoid improves recall, loses precision Metric Learning gives significantly better results!

slide-34
SLIDE 34

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Results Improvement

slide-35
SLIDE 35

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Results: Recall

All Single Multi Easy Difficult All-BEP PAMIR [7] 26 34 26 43 22 17 WN 32 40 31 49 28 24 σWN 31 41 30 49 27 23 WN-ML 36 43 35 53 32 27 σWN-ML 36 46 35 55 32 27

  • 4. Comparison of

WN-ML and PAMIR in terms of

slide-36
SLIDE 36

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Annotation example: Corel 5k

slide-37
SLIDE 37

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Retrieval example: Corel 5k

slide-38
SLIDE 38

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Layout

1

Introduction

2

Metric Learning Tag prediction Rank-based Distance-based Sigmoidal modulation

3

Data Sets and Evaluation Feature Extraction Data Sets Evaluation

4

Results

5

Conclusion

slide-39
SLIDE 39

Introduction Metric Learning Data Sets and Evaluation Results Conclusion

Conclusion

State-of-the-art results! Contributions

Metric learning (no manual tuning) Sigmoidal boosting for rare word recall