image identification with natural language specification
play

Image Identification with Natural Language Specification Qi Feng, - PowerPoint PPT Presentation

Introduction Methods Results Saliency Map Image Identification with Natural Language Specification Qi Feng, Donghyun Kim Department of Computer Science, Boston University fung@bu.edu, donhk@bu.edu December 08, 2017 . . . . . . . .


  1. Introduction Methods Results Saliency Map Image Identification with Natural Language Specification Qi Feng, Donghyun Kim Department of Computer Science, Boston University fung@bu.edu, donhk@bu.edu December 08, 2017 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  2. Introduction Methods Results Saliency Map Outline Introduction 1 Methods 2 Results 3 4 Saliency Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  3. Introduction Methods Results Saliency Map Photo Search Figure: Screen shot of a natural language search on Google Photos. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  4. Introduction Methods Results Saliency Map The Problem Figure: Identification of the target image by natural language specification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  5. Introduction Methods Results Saliency Map GloVe Embedding GloVe is an unsupervised learning algorithm for obtaining vector representations for words.[2] Figure: The projection of word embedding into 2D space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  6. Introduction Methods Results Saliency Map The Baseline Model ▶ Cosine similarity ▶ average of word embeddings[2] ▶ the input query ▶ a generated caption for an image[4] ▶ The Inception v3 ▶ pretrained on the ILSVRC-2012-CLS[3]. ▶ The language model ▶ trained 20,000 iterations on MSCOCO[1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  7. Introduction Methods Results Saliency Map The Proposed Model Image Query CNN Visual Representation concat Language Model(LSTM) Similarity Figure: The proposed model. Red rounded rectangles are inputs to the model. The blue rectangle is the intermediate result from the convolutional neural network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  8. Introduction Methods Results Saliency Map Training and Testing Figure: Positive Training Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  9. Introduction Methods Results Saliency Map Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  10. Introduction Methods Results Saliency Map Results cont. ▶ The Baseline Model: 91.1% ▶ The Proposed Model: 93.4% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  11. Introduction Methods Results Saliency Map Excitation Back-propagation for Saliency Map ▶ Goal ▶ The goal is to find a salient region in input to interpret model’s predictions using a back-propagation. ▶ Assumptions ▶ The response of the activation neuron is non-negative. ▶ An activation neuron is tuned to detect certain visual features. Its response is positively correlated to its confidence of the detection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  12. Introduction Methods Results Saliency Map Spacial and Temporal Saliency Figure: Spatial and temporal saliency on MS-COCO. Original images on the left and saliency maps on the right. The queries are shown under each image. Red word represents the maximum temporal saliency. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  13. Introduction Methods Results Saliency Map Conclusion ▶ A model that identify an image by natural language specifications. ▶ An RNN to measure the similarity between images and queries. ▶ Excitation Back-propagation for finding spatial and temporal groundings. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  14. Introduction Methods Results Saliency Map Tsung-Yi Lin, Michael Maire, Serge J. Belongie, Lubomir D. Bourdev, Ross B. Girshick, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft COCO: common objects in context. CoRR , abs/1405.0312, 2014. Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) , pages 1532–1543, 2014. Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) , 115(3):211–252, 2015. Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

  15. Introduction Methods Results Saliency Map Show and tell: Lessons learned from the 2015 MSCOCO image captioning challenge. CoRR , abs/1609.06647, 2016. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Image Identification with Natural Language Specification Qi Feng, Donghyun Kim

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend