combining text image in wikipediamm task 2009
play

Combining text/image in WikipediaMM task 2009 Christophe Moulin, C - PowerPoint PPT Presentation

Combining text/image in WikipediaMM task 2009 Christophe Moulin, C ecile Barat, C edric Lema tre, Mathias G ery, Christophe Ducottet, Christine Largeron Laboratoire Hubert Curien, Saint- Etienne, France October 1st 2009


  1. Combining text/image in WikipediaMM task 2009 Christophe Moulin, C´ ecile Barat, C´ edric Lemaˆ ıtre, Mathias G´ ery, Christophe Ducottet, Christine Largeron Laboratoire Hubert Curien, Saint-´ Etienne, France October 1st 2009 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 1 / 16

  2. Outline 1 Model overview Textual vector space model Visual vocabulary Combining text and image modalities 2 Experiments 3 Conclusion and future work Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 2 / 16

  3. Model overview Model overview A textual/visual model based on the bag of words approach bag of words +( 1 − α ) α approach ✞ ☎ ✞ ☎ ✞ ☎ documents indexing combining ✝ ✆ ✝ ✆ ✝ ✆ Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 3 / 16

  4. Model overview Textual vector space model Textual vocabulary creation Main steps of the textual bag of words creation ✄ � ✄ � ✄ � stop words filtering Porter stemming bag of words creation ✂ ✁ ✂ ✁ ✂ ✁ Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 4 / 16

  5. Model overview Textual vector space model Textual vector weighting Salton’s based tf.idf weighting [ 1 ] bag of words vector of tf.idf weights ☛ ✟ [2] w i , j = tf i , j idf j ✡ ✠ tf i , j : representativeness idf j : discrimination power [1]: Salton et al. A vector space model for automatic indexing , 1975 [2]: Robertson et al. Okapi et trec-3 , 1994 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 5 / 16

  6. Model overview Textual vector space model Exploiting of the text around an image Two sources of text : metadata + extracted text of the original Wikipedia articles metadata of Wikipedia image used in ImageCLEFwiki original Wikipedia article ( n char around the image) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 6 / 16

  7. Model overview Visual vocabulary Visual representation Similar to the text representation using a visual codebook [ 3 ] Visual vocabulary creation descriptors visual bag of visual descriptors projection vocabulary words Image representation vector of descriptors bag of visual tfidf weights words [3]: Jurie et al. Creating efficient codebooks for visual recognition , 2005 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 7 / 16

  8. Model overview Visual vocabulary Visual features computation Two different descriptors are used regular partitioning: 16 × 16 cells meanstd (6 dimensions: 9350 visual words) sift 2 (128 dimensions: 9630 visual words) interest regions based on MSER detector sift 1 (128 dimensions: 9303 visual words) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 8 / 16

  9. Model overview Combining text and image modalities Score matching Distance computed between query and document vectors query documents query document tf tf.idf score 1 score 2 tf.idf tf.idf Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 9 / 16

  10. Model overview Combining text and image modalities Model overview Linear combination of textual and visual scores bag of words +( 1 − α ) α approach α is fixed globally on ImageCLEFwiki 2008 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 10 / 16

  11. Experiments Global results rank participant/score text image map num ret num rel ret 1 deuceng TXT - 0.2397 43052 1351 5 lahc/score 2 100 char meanstd ( α =0.025) 0.2178 44993 1213 6 lahc/score 2 50 char meanstd ( α =0.025) 0.2148 44993 1218 14 lahc/score 2 metadata sift 2 ( α =0.084) 0.1903 44993 1212 15 lahc/score 2 100 char - 0.1890 38004 1205 16 lahc/score 2 50 char - 0.1880 37041 1198 20 lahc/score 2 metadata meanstd ( α =0.025) 0.1845 44993 1208 21 lahc/score 2 metadata sift 1 ( α =0.012) 0.1807 44995 1200 24 lahc/score 2 metadata meanstd ( α =0.015) 0.1792 44993 1213 33 lahc/score 2 metadata - 0.1667 35611 1192 44 lahc/score 1 metadata - 0.1432 35611 1164 52 lahc/score 2 metadata sift 2 0.0365 619 142 53 lahc/score 2 metadata meanstd 0.0338 574 76 54 lahc/score 2 metadata sift 1 0.0321 637 120 57 sztaki - IMG 0.0068 44993 80 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 11 / 16

  12. Experiments Textual results 0.7 score 1 (map: 0.1432) score 2 (map: 0.1667) score 2 50 char (map: 0.1880) score 2 100 char (map: 0.1890) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Improvements provided by additional text (15%) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 12 / 16

  13. Experiments Textual+visual results 0.7 score 2 (map: 0.1667) score 2 sift 1 : α =0.012 (map: 0.1807) score 2 meanstd: α =0.025 (map: 0.1845) score 2 sift 2 : α =0.084 (map: 0.1903) 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 sift 2 > meanstd > sift 1 Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 13 / 16

  14. Experiments Best results 0.8 score 2 50 char (map: 0.1880) score 2 100 char (map: 0.1890) score 2 50 char + meanstd (map: 0.2148) score 2 100 char + meanstd (map: 0.2178) 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Improvements provided by visual information (15%) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 14 / 16

  15. Conclusion and future work Conclusion Improvement of our last year model It works: Text around the image in original wikipedia articles. (+15%) Addition of visual features (MSER+sift). (color/texture complementarity) Text-Image combination. (+15%) Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 15 / 16

  16. Conclusion and future work Future work Combination with more than one visual descriptor. Other fusion method. Learn α for each query. Christophe Moulin et al. (LaHC) Combining text/image in WikipediaMM task 2009 October 1st 2009 16 / 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend