Combining text/image in WikipediaMM task 2009 Christophe Moulin, C - - PowerPoint PPT Presentation

▶

Mar 24, 2024 1.24k likes •1.47k views

Combining text/image in WikipediaMM task 2009 Christophe Moulin, C ecile Barat, C edric Lema tre, Mathias G ery, Christophe Ducottet, Christine Largeron Laboratoire Hubert Curien, Saint- Etienne, France October 1st 2009

SLIDE 1

Combining text/image in WikipediaMM task 2009

Christophe Moulin, C´ ecile Barat, C´ edric Lemaˆ ıtre, Mathias G´ ery, Christophe Ducottet, Christine Largeron

Laboratoire Hubert Curien, Saint-´ Etienne, France

October 1st 2009

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 1 / 16

SLIDE 2

Outline

1 Model overview

Textual vector space model Visual vocabulary Combining text and image modalities

2 Experiments 3 Conclusion and future work

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 2 / 16

SLIDE 3

Model overview α +(1 − α)

bag of words approach

✞ ✝ ☎ ✆

documents

✞ ✝ ☎ ✆

indexing

✞ ✝ ☎ ✆

combining

Model overview

A textual/visual model based on the bag of words approach

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 3 / 16

SLIDE 4

Model overview Textual vector space model

✄ ✂

stop words filtering

✄ ✂

Porter stemming

✄ ✂

bag of words creation

Textual vocabulary creation

Main steps of the textual bag of words creation

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 4 / 16

SLIDE 5

Model overview Textual vector space model

bag of words vector of tf.idf weights

[2]

[1]: Salton et al. A vector space model for automatic indexing, 1975 [2]: Robertson et al. Okapi et trec-3, 1994

Textual vector weighting

Salton’s based tf.idf weighting [1]

☛ ✡ ✟ ✠

wi,j = tfi,jidfj

tfi,j: representativeness idfj: discrimination power

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 5 / 16

SLIDE 6

Model overview Textual vector space model

riginal Wikipedia article

(n char around the image) metadata of Wikipedia image used in ImageCLEFwiki

Exploiting of the text around an image

Two sources of text : metadata + extracted text of the original Wikipedia articles

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 6 / 16

SLIDE 7

Model overview Visual vocabulary

descriptors descriptors projection visual vocabulary bag of visual words descriptors bag of visual words vector of tfidf weights

[3]: Jurie et al. Creating efficient codebooks for visual recognition, 2005

Visual representation

Similar to the text representation using a visual codebook [3]

Visual vocabulary creation Image representation

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 7 / 16

SLIDE 8

Model overview Visual vocabulary

meanstd (6 dimensions: 9350 visual words) sift2 (128 dimensions: 9630 visual words) sift1 (128 dimensions: 9303 visual words)

Visual features computation

Two different descriptors are used

regular partitioning: 16 × 16 cells interest regions based on MSER detector

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 8 / 16

SLIDE 9

Model overview Combining text and image modalities query documents

Score matching

Distance computed between query and document vectors

query document score1 tf tf.idf score2 tf.idf tf.idf

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 9 / 16

SLIDE 10

Model overview Combining text and image modalities α +(1 − α)

bag of words approach

Model overview

Linear combination of textual and visual scores

α is fixed globally on ImageCLEFwiki 2008

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 10 / 16

SLIDE 11

Experiments

Global results

rank participant/score text image map num ret num rel ret 1 deuceng TXT

0.2397

43052 1351 5 lahc/score2 100 char meanstd (α=0.025) 0.2178 44993 1213 6 lahc/score2 50 char meanstd (α=0.025) 0.2148 44993 1218 14 lahc/score2 metadata sift2 (α=0.084) 0.1903 44993 1212 15 lahc/score2 100 char

0.1890

38004 1205 16 lahc/score2 50 char

0.1880

37041 1198 20 lahc/score2 metadata meanstd (α=0.025) 0.1845 44993 1208 21 lahc/score2 metadata sift1 (α=0.012) 0.1807 44995 1200 24 lahc/score2 metadata meanstd (α=0.015) 0.1792 44993 1213 33 lahc/score2 metadata

0.1667

35611 1192 44 lahc/score1 metadata

0.1432

35611 1164 52 lahc/score2 metadata sift2 0.0365 619 142 53 lahc/score2 metadata meanstd 0.0338 574 76 54 lahc/score2 metadata sift1 0.0321 637 120 57 sztaki

0.0068 44993 80

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 11 / 16

SLIDE 12

Experiments

Textual results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.4 0.6 0.8 1 score1 (map: 0.1432) score2 (map: 0.1667) score2 50 char (map: 0.1880) score2 100 char (map: 0.1890)

Improvements provided by additional text (15%)

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 12 / 16

SLIDE 13

Experiments

Textual+visual results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.2 0.4 0.6 0.8 1 score2 (map: 0.1667) score2 sift1: α=0.012 (map: 0.1807) score2 meanstd: α=0.025 (map: 0.1845) score2 sift2: α=0.084 (map: 0.1903)

sift2 > meanstd > sift1

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 13 / 16

SLIDE 14

Experiments

Best results

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.2 0.4 0.6 0.8 1 score2 50 char (map: 0.1880) score2 100 char (map: 0.1890) score2 50 char + meanstd (map: 0.2148) score2 100 char + meanstd (map: 0.2178)

Improvements provided by visual information (15%)

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 14 / 16

SLIDE 15

Conclusion and future work

Conclusion

Improvement of our last year model

It works: Text around the image in original wikipedia articles. (+15%) Addition of visual features (MSER+sift). (color/texture complementarity) Text-Image combination. (+15%)

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 15 / 16

SLIDE 16

Conclusion and future work

Future work

Combination with more than one visual descriptor. Other fusion method. Learn α for each query.

Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 16 / 16