Spatial and Temporal representations for Multi-Modal Visual - PowerPoint PPT Presentation

Spatial and Temporal representations for Multi-Modal Visual Retrieval 17th December 2018 Noa Garcia Docampo PhD Candidate, Aston University

Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections?

Introduction Million of images created every day... Million of images created every day... Problem : How to find images in large Problem : How to find images in large collections? collections? Solution : Visual Retrieval! Image Retrieval exists from the 90s ● Many types of visual retrieval ●

Introduction We classify visual retrieval into 3 main types, depending on the query object and the dataset content:

Structure Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets Visual Retrieval ● MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ● SemArt: semantic art understanding dataset ● Cross-Modal Cross-modal retrieval for semantic art understanding Retrieval ●

Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l

Symmetric Visual Retrieval Standard CBIR system

Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Standard CBIR system

Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● ○ ○ ○ ○ Standard CBIR system

Symmetric Visual Retrieval Drawbacks of metric distances Do not consider data distribution ● Metric distance constraints: ● Standard CBIR system

Symmetric Visual Retrieval Standard CBIR system Proposed CBIR system Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

Similarity Networks

Symmetric Visual Retrieval Off-the-shelf methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

Symmetric Visual Retrieval Off-the-shelf methods Fine-tuned methods Garcia & Vogiatzis (2018). Learning Non-Metric Visual Similarity for Image Retrieval. Under review at IMAVIS journal

Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval

Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval

Asymmetric Visual Retrieval Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation Garcia & Vogiatzis (2018). Temporal Aggregation of Visual Features for Large-Scale Image-to-Video Retrieval. In: ICMR 2018

Asymmetric Visual Retrieval Temporal Local Aggregation Feature Indexing Garcia & Vogiatzis (2018). Dress like a Star: Retrieving Fashion Products from Videos. In: CVF workshop ICCV 2017

Asymmetric Visual Retrieval Temporal Local Aggregation Search and Retrieval Garcia & Vogiatzis (2018). Dress like a Star: Retrieving Fashion Products from Videos. In: CVF workshop ICCV 2017

Asymmetric Visual Retrieval Chapter 5 Chapter 6 No temporal aggregation

Asymmetric Visual Retrieval Spatio-Temporal Global Aggregation Garcia & Vogiatzis (2018). Asymmetric Spatio-Temporal Embeddings for Large-Scale Image-to-Video Retrieval. In: BMVC 2018

Asymmetric Visual Retrieval Spatio-Temporal Global Aggregation Chapter 6 Temporal Local Aggregation Chapter 5 ● High accuracy ● Global aggregation state-of-the-art accuracy ● High compression rates ● High compression rates ● Multiple searches per query ● Single search per query

Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ●

Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval

Cross-Modal Retrieval Retrieve paintings from artistic comments Artistic Comments: ● Not only descriptions of the content but also ○ about the author, context, techniques, etc. Fine-art paintings: ● ○ Figurative representations Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

Cross-Modal Retrieval Visual Encoding (images): VGG16, ResNet , RMAC ● Text Encoding (comments and titles): BOW , MLP, RNN ● Cross-Modal Transformation: CCA, Cosine Margin Loss , Augmented with Metadata ● Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

Cross-Modal Retrieval Same type images Random images Human Comparison: Easy Set Human Comparison: Difficult Set Garcia & Vogiatzis (2018). How to Read Paintings: Semantic Art Understanding with Multi-Modal Retrieval. In: VISART workshop ECCV 2018

Contributions CNNs for non-metric visual similarity ● Symmetric Pushing performance on standard CBIR datasets ● Visual Retrieval MoviesDB: image-to-video retrieval dataset ● Binary descriptors for local aggregation of video features ● Asymmetric Spatio-temporal encoders for global aggregation of video features Visual Retrieval ● Item video retrieval application ● SemArt: semantic art understanding dataset ● Cross-Modal Cross-modal retrieval for semantic art understanding Retrieval ●

Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l Asymmetric Visual Retrieval Cross-Modal Retrieval Conclusions and Final Remarks

Future Work Symmetric Similarity networks for other retrieval tasks ● Visual Retrieval Temporal aggregation at the scene level ● Asymmetric Visual Retrieval Asymmetric techniques for video-to-image retrieval ● Style and content detector for cross-modal retrieval in art ● Cross-Modal Retrieval SemArt dataset for alternative tasks ●

Introduction and Background S y m m e t r i c V R i s e u t r a i l e v a l

Content-Based Image Retrieval

Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Network Output

Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Pair Label

Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Margin

Similarity Networks Input : Concatenation of feature vectors ● Architecture : Fully connected layers with ReLU ● Output : Similarity score ● Loss Function Standard Similarity

Spatial and Temporal representations for Multi-Modal Visual - PowerPoint PPT Presentation

Spatial and Temporal representations for Multi-Modal Visual Retrieval 17th December 2018 Noa Garcia Docampo PhD Candidate, Aston University Introduction Million of images created every day... Million of images created every day... Problem :

Temporal and Modal Logic Based on paper: E.A. Emerson. Temporal and Modal Logic J. van Leeuwen,

The Expressive Power of Backround Modal Dependence Logic Modal logic Team semantics Modal

Temporal, Spatial, and Spatio-temporal Granularities Gabriele Pozzani Department of Computer

Multi-modal Face Recognition Hu Han hanhu@ict.ac.cn http: / / vipl.ict.ac.cn/ members/ hhan

Spatio-Temporal Statistics with R Chapter Two: Exploring Spatio-Temporal Data Spatio-Temporal

Water Resources Water Resources Water Resources Water Resources Geospatial World Forum 2014

Spatial and temporal Spatial and temporal changes in Namaqualand Namaqualand: : changes in

Quantifying Temporal and Spatial Quantifying Temporal and Spatial Localities Localities Florida

Modal logic Benzm uller/Rojas, 2014 Artificial Intelligence 2 What is Modal Logic?

W HAT IS EHD? Introduction EHD without cross-flow Modal Dielectric fluid Non-modal EHD with

Why is modal logic decidable Petros Potikas NTUA 9/5/2017 Petros Potikas (NTUA) Modal logic

Panel Regarding Marine Panel Regarding Marine Spatial Planning Spatial Planning A public process

Spatial navigation in humans Recap: navigation strategies and spatial representations Spatial

Hierarchical Spatial Gossip for Hierarchical Spatial Gossip for Multi- -Resolution

Resource 1: What is spatial? presentation notes Section Section text Notes 1. Spatial

Broadening the Study of Spatial Intelligence Mary Hegarty University of California, Santa

CSE 291D/234 Data Systems for Machine Learning Arun Kumar Topic 2: Deep Learning Systems DL

Ultrascale Visualiza.on Workshop 11/13/2011 Work supported under:

Recent Image Retrieval Techniques Sung-Eui Yoon ( ) ( ) C Course URL: URL

Local features: detection and description detection and description Kristen Grauman UT Austin

Workstream Update: GLP Site Accr ccreditation Friday 23 rd June, I2I Convening, Hotel Kempinski,

6.891 Computer Vision and Applications Prof. Trevor. Darrell Lecture 14: Unsupervised

CMPT882Recognition ProblemsinComputerVision GregMori Outline

18.175: Lecture 1 Probability spaces and -algebras Scott Sheffield MIT 1 18.175 Lecture 1