spatial and temporal representations for multi modal
play

Spatial and Temporal representations for Multi-Modal Visual - PowerPoint PPT Presentation

Spatial and Temporal representations for Multi-Modal Visual Retrieval 17th December 2018 Noa Garcia Docampo PhD Candidate, Aston University Introduction Million of images created every day... Million of images created every day... Problem :


  1. Spatial and Temporal representations for Multi-Modal Visual Retrieval 17th December 2018 Noa Garcia Docampo PhD Candidate, Aston University

  2. Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections?

  3. Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections? Solution : Visual Retrieval! Image Retrieval exists from the 90s ● Many types of visual retrieval ●

  4. Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections? Solution : Visual Retrieval! Image Retrieval exists from the 90s ● Many types of visual retrieval ●

  5. Introduction We classify visual retrieval into 3 main types, depending on the query object and the dataset content:

  6. Introduction We classify visual retrieval into 3 main types, depending on the query object and the dataset content:

  7. Structure Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

  8. Structure Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

  9. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets Visual Retrieval ● MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ● SemArt: semantic art understanding dataset ● Cross-Modal Cross-modal retrieval for semantic art understanding Retrieval ●

  10. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l

  11. Symmetric Visual Retrieval Standard CBIR system

  12. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Standard CBIR system

  13. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● ○ ○ ○ ○ Standard CBIR system

  14. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● Standard CBIR system

  15. Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● Standard CBIR system

  16. Symmetric Visual Retrieval Standard CBIR system Proposed CBIR system Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  17. Similarity Networks

  18. Symmetric Visual Retrieval Off-the-shelf methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  19. Symmetric Visual Retrieval Off-the-shelf methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  20. Symmetric Visual Retrieval Off-the-shelf methods Fine-tuned methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  21. Symmetric Visual Retrieval Off-the-shelf methods Fine-tuned methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

  22. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval

  23. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval

  24. Asymmetric Visual Retrieval Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  25. Asymmetric Visual Retrieval Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  26. Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  27. Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

  28. Asymmetric Visual Retrieval Temporal Local Aggregation Feature Indexing Garcia & Vogiatzis (2018). Dress like a Star: Retrieving Fashion Products from Videos. In: CVF workshop ICCV 2017

  29. Asymmetric Visual Retrieval Temporal Local Aggregation Search and Retrieval Garcia & Vogiatzis (2018). Dress like a Star: Retrieving Fashion Products from Videos. In: CVF workshop ICCV 2017

  30. Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation

  31. Asymmetric Visual Retrieval Spatio-Temporal Global Aggregation Garcia & Vogiatzis (2018). Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval. In: BMVC 2018

  32. Asymmetric Visual Retrieval Spatio-Temporal Global Aggregation Chapter 6 Temporal Local Aggregation Chapter 5 ● High accuracy ● Global aggregation state-of-the-art accuracy ● High compression rates ● High compression rates ● Multiple searches per query ● Single search per query

  33. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ●

  34. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval

  35. Cross-Modal Retrieval Retrieve paintings from artistic comments Artistic Comments: ● Not only descriptions of the content but also ○ about the author, context, techniques, etc. Fine-art paintings: ● ○ Figurative representations Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

  36. Cross-Modal Retrieval Visual Encoding (images): VGG16, ResNet , RMAC ● Text Encoding (comments and titles): BOW , MLP, RNN ● Cross-Modal Transformation: CCA, Cosine Margin Loss , Augmented with Metadata ● Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

  37. Cross-Modal Retrieval Same type images Random images Human Comparison: Easy Set Human Comparison: Difficult Set Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

  38. Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ● SemArt: semantic art understanding dataset ● Cross-Modal Cross-modal retrieval for semantic art understanding Retrieval ●

  39. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

  40. Future Work Symmetric Similarity networks for other retrieval tasks ● Visual Retrieval Temporal aggregation at the scene level ● Asymmetric Visual Retrieval Asymmetric techniques for video-to-image retrieval ● Style and content detector for cross-modal retrieval in art ● Cross-Modal Retrieval SemArt dataset for alternative tasks ●

  41. Q&A

  42. Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l

  43. Content-Based Image Retrieval

  44. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Network Output

  45. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Pair Label

  46. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Margin

  47. Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Standard Similarity

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend