phrase based image captioning
play

Phrase-based Image Captioning Rmi Lebret , Pedro O. Pinheiro, Ronan - PowerPoint PPT Presentation

Phrase-based Image Captioning Rmi Lebret , Pedro O. Pinheiro, Ronan Collobert Idiap Research Institute / EPFL ICML, 9 July 2015 Image Captioning Objective: Generate descriptive sentences given a sample image. A man is grinding a ramp on


  1. Phrase-based Image Captioning Rémi Lebret , Pedro O. Pinheiro, Ronan Collobert Idiap Research Institute / EPFL ICML, 9 July 2015

  2. Image Captioning ◮ Objective: Generate descriptive sentences given a sample image. A man is grinding a ramp on Model a skateboard. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 2 / 18

  3. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  4. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Visual features with Deep CNN Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  5. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Sentence generation with RNN ( e.g. LSTM) Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  6. Related Works ◮ Recent models based on Deep CNN + RNN [Vinyals et al. , Karpathy & Fei-Fei, Mao et al. , Donahue et al. ]. A man is grinding a ramp on a skateboard. Can similar performance be achieved with a simpler model? Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 3 / 18

  7. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp a man is grinding a ramp on a skateboard man riding on edge of an oval ramp with a skate board a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  8. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard man riding on edge of an oval ramp with a skate board a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  9. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  10. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  11. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience � �� � ���� � �� � � �� � � �� � NP PP NP PP NP a man on a skateboard is doing a trick → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  12. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience � �� � ���� � �� � � �� � � �� � NP PP NP PP NP a man on a skateboard is doing a trick � �� � ���� � �� � � �� � � �� � NP PP NP VP NP → Chunking approach to identify the sentence constituents. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  13. Syntax Analysis of Image Descriptions A given image i ∈ I Ground-truth descriptions s ∈ S : a man riding a skateboard up the side of a wooden ramp � �� � � �� � � �� � ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man is grinding a ramp on a skateboard � �� � � �� � � �� � ���� � �� � NP VP NP PP NP man riding on edge of an oval ramp with a skate board ���� � �� � ���� ���� � �� � ���� � �� � NP VP NP PP NP PP NP a man in a helmet skateboarding before an audience � �� � ���� � �� � � �� � � �� � NP PP NP PP NP a man on a skateboard is doing a trick � �� � ���� � �� � � �� � � �� � NP PP NP VP NP → Key elements in images. ◮ Noun phrases (NP) � ◮ Verbal phrases (VP) Interactions between elements. Prepositional phrases (PP) Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 4 / 18

  14. Large-scale Syntax Analysis ◮ Two datasets: Flickr30k + COCO ( ≈ 560k training sentences). 0.7 ● 15 0.6 Cumulative Distribution Function Appareance frequencies (%) ● 0.5 10 0.4 ● ● ● 5 0.3 ● ● ● ● 0.2 ● ● ● ● ● ● ● ● ● ● ● 0 NP VP NP PP NP O NP PP NP VP NP O NP VP NP O NP VP NP PP NP PP NP O NP PP NP PP NP O NP PP NP VP NP PP NP O NP VP NP VP NP O NP PP NP PP NP PP NP O NP VP NP VP NP PP NP O NP PP NP O NP VP NP PP NP VP NP O NP NP VP NP O NP VP NP PP NP PP NP PP NP O NP PP NP PP NP VP NP O NP PP NP O NP O NP PP NP VP NP PP NP PP NP O NP NP VP NP PP NP O NP VP NP SBAR VP NP O NP O NP VP NP O NP VP NP O VP NP O ◮ Describing images: 1. Predicting NP, VP and PP. 2. Finding how they all interact. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 5 / 18

  15. Phrase-based Model for Image Descriptions Our approach: 1. A bilinear model that learns a metric between an image and phrases used to describe it. 2. Sentences generated using a simple language model based on caption syntax statistics. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 6 / 18

  16. A Bilinear Model U T V � U = ( u c 1 , . . . , u c |C| ) ∈ R m ×|C| I = set of training images trainable parameters θ V ∈ R m × n C = set of all phrases used to describe I a man a skate board NP a wooden ramp V U riding VP is grinding on PP with A man in a helment skateboarding before an audience. Man riding on edge of an oval ramp with a skate board. A man riding a skateboard up the side of a wooden ramp. A man on a skateboard is doing a trick. A man is grinding a ramp on a skateboard. Rémi Lebret (Idiap Research Institute / EPFL) Phrase-based Image Captioning ICML 2015 7 / 18

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend