understanding complex scenes a man holding a tennis
play

Understanding complex scenes a man holding a tennis racquet on - PowerPoint PPT Presentation

Understanding complex scenes a man holding a tennis racquet on a tennis court the man is on the tennis court playing a game Knowledge Freebase Text Vision Barack Obama is an American


  1.    

  2. Understanding complex scenes a man holding a tennis racquet on a tennis court the man is on the tennis court playing a game

  3.       

  4. Knowledge Freebase Text Vision Barack Obama is an American politician serving as the 44th President of the United States. Born in Honolulu, Hawaii, … in 2008, he defeated Republican nominee and was inaugurated as president on January 20, 2009. http://s122.photobucket.com/user/b (Wikipedia.org) meuppls/media/stampede.jpg.html

  5. Winning entries of COCO 2015 Caption Challenge        Compositional framework is *less elegant* but can potentially exploit non paired image-caption data more effectively

  6. Turing ng T est st Re Resu sult lts at the MS COCO Captioning Challenge 2015 % of captions that Official pass the Turing Test Rank MSR 32.2% % 1st Goog ogle le 31.7% 1st 1st Still a big gap! MSR Captivato tivator r 30.1% 3rd Mont ntreal eal/T /T or oront nto 27.2% 3rd Berkeley ley LRCN 26.8% 5th Other er gr grou oups ps: Baidu/ u/UCL CLA, Stanf anfor ord, , Tsinghua, hua, etc. Human 67.5% --

  7.     

  8. Visual concepts Celebrity Language Model A small boat in Ha Long Bay high ConvNets Confidence Landmark Model low This image contains: water, Features vector DMSM boat, lake, mountain, etc. [Kenneth Tran, Xiaodong He, Lei Zhang, Jian Sun, Cornelia Carapcea, Chris Thrasher, Chris Buehler, Chris Sienkiewicz submitted to CVPR Deep Vision 2016]

  9. [He, Zhang, Ren, Sun, 2015]    

  10. cabinets room wooden kitchen stove Repeat to generate 500 candidates cabinets sink floor [Fang, et al., CVPR 2015]

  11. The deep multimod modal al semant mantic ic model l [Fang, et al., CVPR 2015] sema mantic ntic space ce : The overall semantics of a caption will also be represented by a vector in this space. If these two vectors are close to each other, then the caption is a good match for the image. W 4 W 4 Otherwise, not a matching caption. H3 H3 H3 H3 W 3 W 3 H2 H2 W 2 W 2 H1 H1 W 1 W 1 Input t1 Input s Text: a man holding a tennis Fully connected Image feature racquet on a tennis court Convolution/pooling Raw Image pixels [Huang, He, Gao, Deng et al., 2013] [He, Zhang, Ren, Sun, 2015]

  12.    [Guo, Zhang, Hu, He, Gao, 2016]

  13. W 4 W 4 H3 H3 H3 H3 W 3 W 3 H2 H2 W 2 W 2 H1 H1 W 1 W 1 Input t1 Input s caption: a man holding a Image tennis racquet on a tennis court

  14. System Excellent Good Bad Embarrassing Fang et al., 40.6% 26.8% 28.8% 3.8% 2015 New 51.8% 23.4% 22.5% 2.4% system Human evaluation on 1000 random samples of the COCO test set.

  15. System Excellent Good Bad Embarrassing Fang et al., 12.0% 13.4% 63.0% 11.6% 2015 New 25.4% 24.1% 45.3% 5.2% system Human evaluation on Instagram test set, which contains 1380 random images that we scraped from Instagram.

  16. Cognitive Services http://CaptionBot.ai

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend