photo annotation and concept based retrieval tasks
play

Photo Annotation and Concept-Based Retrieval Tasks Eleftherios - PowerPoint PPT Presentation

Photo Annotation Concept-based Retrieval Results Conclusions MLKD's Participation at the ImageCLEF 2011 Photo Annotation and Concept-Based Retrieval Tasks Eleftherios Spyromitros-Xioufis, Konstantinos Sechidis, Grigorios Tsoumakas and Ioannis


  1. Photo Annotation Concept-based Retrieval Results Conclusions MLKD's Participation at the ImageCLEF 2011 Photo Annotation and Concept-Based Retrieval Tasks Eleftherios Spyromitros-Xioufis, Konstantinos Sechidis, Grigorios Tsoumakas and Ioannis Vlahavas Machine Learning and Knowledge Discovery Group, Department of Informatics, Aristotle University of Thessaloniki, Greece 1 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  2. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Photo annotation task Sunset Plants Trees Aesthetic Day Sky Partly blurred Calm Outdoor Cute • A multi-label classification problem (each image belongs to many concepts) • Evaluation measures: 1. Mean interpolated average precision ( MIAP ) 2. Example-based F-measure ( F-ex ) 3. Semantic R-precision ( SR-Precision ) • Model selection: based on Mean Average Precision (MAP) • MAP estimation: 3 fold cross-validation on the 8000 training images • 5 submissions in total: • Visual • Textual • 2 Multi-modal (3 variations) CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  3. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Visual model – feature extraction • The ColorDescriptor [van de Sande et al., 2010] software was used for visual feature extraction • 2 point detection strategies: Harris-Laplace, Dense Sampling • 7 descriptors: SIFT, HSV-SIFT, HueSIFT, OpponentSIFT, C-SIFT, rgSIFT and RGB-SIFT • Codebook generation • K-means (other?) clustering on 250,000 randomly sampled points (more points?) • Codebook size (k) fixed to 4096 words (more words?) • Hard assignment of points to clusters • 14 multi-label training datasets in total • #features: 4096 • #labels: 99 3 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  4. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Visual model – learning method • The Binary Relevance (problem transformation) method was used: • Transforms the multi-label classification task into multiple binary classification tasks • Any single-label classifier can be used (Random Forest #trees:150 #features:40 ) • Instance weighting to deal with class imbalance: 𝑛𝑗𝑜:𝑛𝑏𝑘 𝑛𝑗𝑜:𝑛𝑏𝑘 𝑥 𝑛𝑗𝑜 = 𝑥 𝑛𝑏𝑘 = 𝑛𝑗𝑜 𝑛𝑏𝑘 Training set for 𝝁 𝟐 𝒈 𝟐 𝒈 𝟑 𝒈 𝟓𝟏𝟘𝟕 𝝁 𝟐 𝝁 𝟑 𝝁 𝟘𝟘 … … 𝒚 𝟐 1 0 … 1 0 1 … 1 𝒚 𝟑 0 1 … 0 1 0 … 0 … … … … … … … … … 𝒚 𝟗𝑳 0 0 … 1 0 0 … 1 Feature Space Target 4 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  5. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Visual model – learning method • The Binary Relevance (problem transformation) method was used: • Transforms the multi-label classification task into multiple binary classification tasks • Any single-label classifier can be used (Random Forest #trees:150 #features:40 ) • Instance weighting to deal with class imbalance: 𝑛𝑗𝑜:𝑛𝑏𝑘 𝑛𝑗𝑜:𝑛𝑏𝑘 𝑥 𝑛𝑗𝑜 = 𝑥 𝑛𝑏𝑘 = 𝑛𝑗𝑜 𝑛𝑏𝑘 Training set for 𝝁 𝟑 𝒈 𝟐 𝒈 𝟑 𝒈 𝟓𝟏𝟘𝟕 𝝁 𝟐 𝝁 𝟑 𝝁 𝟘𝟘 … … 𝒚 𝟐 1 0 … 1 0 1 … 1 𝒚 𝟑 0 1 … 0 1 0 … 0 … … … … … … … … … 𝒚 𝟗𝑳 0 0 … 1 0 0 … 1 Target Feature Space 5 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  6. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Visual model – learning method • The Binary Relevance (problem transformation) method was used: • Transforms the multi-label classification task into multiple binary classification tasks • Any single-label classifier can be used (Random Forest #trees:150 #features:40 ) • Instance weighting to deal with class imbalance: 𝑛𝑗𝑜:𝑛𝑏𝑘 𝑛𝑗𝑜:𝑛𝑏𝑘 𝑥 𝑛𝑗𝑜 = 𝑥 𝑛𝑏𝑘 = 𝑛𝑗𝑜 𝑛𝑏𝑘 Training set for 𝝁 𝟘𝟘 𝒈 𝟐 𝒈 𝟑 𝒈 𝟓𝟏𝟘𝟕 𝝁 𝟐 𝝁 𝟑 𝝁 𝟘𝟘 … … 𝒚 𝟐 1 0 … 1 0 1 … 1 𝒚 𝟑 0 1 … 0 1 0 … 0 … … … … … … … … … 𝒚 𝟗𝑳 0 0 … 1 0 0 … 1 Target Feature Space 6 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  7. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Textual model – feature extraction • Flickr user tags were used • Initial vocabulary: the union of tag sets of the training images • Stemming : porter stemmer (English..) & stop word removal -> 27000 stems 2 • Feature selection using 𝜓 𝑛𝑏𝑦 criterion [Lewis et al., 2004] : 𝜓 2 statistic for each feature with respect to each label is calculated • Features are ranked according to their maximum 𝜓 2 score across all labels • • After evaluation of different sizes top 4000 features were selected 7 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  8. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Textual model – learning method • Ensemble of Classifier Chains (ECC) [Read et al., 2009] : • Random chains are created • Feature set for each label in the chains is augmented with the previous labels • Able to capture correlations, class imbalance is still a problem Chain order: 1,2,..,99 Training set for 𝝁 𝟐 𝒈 𝟐 𝒈 𝟑 𝒈 𝟓𝟏𝟏𝟏 𝝁 𝟐 𝝁 𝟑 𝝁 𝟘𝟘 … … 𝒚 𝟐 1 0 … 1 0 1 … 1 𝒚 𝟑 0 1 … 0 1 0 … 0 … … … … … … … … … 𝒚 𝟗𝑳 0 0 … 1 0 0 … 1 Feature Space Target 8 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  9. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Textual model – learning method • Ensemble of Classifier Chains (ECC) [Read et al., 2009] : • Random chains are created • Feature set for each label in the chains is augmented with the previous labels • Able to capture correlations, class imbalance is still a problem Chain order: 1,2,..,99 Training set for 𝝁 𝟑 𝒈 𝟐 𝒈 𝟑 𝒈 𝟓𝟏𝟏𝟏 𝝁 𝟐 𝝁 𝟑 𝝁 𝟘𝟘 … … 𝒚 𝟐 1 0 … 1 0 1 … 1 𝒚 𝟑 0 1 … 0 1 0 … 0 … … … … … … … … … 𝒚 𝟗𝑳 0 0 … 1 0 0 … 1 Feature Space Target 9 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  10. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Textual model – learning method • ECC is also a problem transformation method: • Again coupled with Random Forest as base classifier (#trees:10, #features:default) • Ensemble size: 15 (150 random trees in total for each label) • Again instance weighting for class imbalance Chain order: 1,2,..,99 Training set for 𝝁 𝟘𝟘 𝒈 𝟐 𝒈 𝟑 𝒈 𝟓𝟏𝟏𝟏 𝝁 𝟐 𝝁 𝟑 𝝁 𝟘𝟘 … … 𝒚 𝟐 1 0 … 1 0 1 … 1 𝒚 𝟑 0 1 … 0 1 0 … 0 … … … … … … … … … 𝒚 𝟗𝑳 0 0 … 1 0 0 … 1 Feature Space Target 10 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

  11. Setup Photo Annotation Visual Concept-based Retrieval Textual Results Multi-modal Conclusions Thresholding Multi-modal Harris-Laplace model 𝑞 ℎ𝑚 𝑑 𝑘 𝑦 𝑗 ∀𝑘 7 descriptor average 𝑦 𝑗 𝑞 𝑒𝑡 𝑑 𝑘 𝑦 𝑗 ∀𝑘 Dense-sampling model Averaging/ 𝑞 𝑑 𝑘 𝑦 𝑗 ∀𝑘 7 descriptor average Arbitrator 𝑞 𝑔𝑚𝑗𝑑𝑙𝑠 𝑑 𝑘 𝑦 𝑗 ∀𝑘 Textual model • A hierarchical late fusion scheme: • 3 different views of the images: • Harris Laplace -> concepts related to objects (Fish and Ship) • Dense sampling -> concepts related to scenes (Night and Macro) • Textual - > concepts which are typically tagged by users (Dog , Insect, …) • 2 ways to combine the 3 different views: • Averaging • Arbitrator (the best view based on internal evaluation) 11 CLEF 2011, 19-22 September 2011, Amsterdam Eleftherios Spyromitros – Xioufis | espyromi@csd.auth.gr

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend