conclusions trecvid 2009 conclusions trecvid 2009
play

Conclusions TRECVID 2009 Conclusions TRECVID 2009 Multi Multi- - PDF document

MediaMill TRECVID 2010 18 11 2010 Any Hope for Cross Any Hope for Cross- -Domain Concept Detection Domain Concept Detection in Internet Video? in Internet Video? Cees Cees G.M. G.M. Snoek Snoek, , Koen Koen E.A. van de E.A. van


  1. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Any Hope for Cross Any Hope for Cross- -Domain Concept Detection Domain Concept Detection in Internet Video? in Internet Video? Cees Cees G.M. G.M. Snoek Snoek, , Koen Koen E.A. van de E.A. van de Sande Sande, , Dennis Dennis C. C. Koelma Koelma, & Arnold W.M. , & Arnold W.M. Smeulders Smeulders Intelligent Systems Lab Amsterdam University of Amsterdam, The Netherlands y , Conclusions TRECVID 2009 Conclusions TRECVID 2009 • Multi Multi- -frame is true performance booster frame is true performance booster – 30% 30% improvement over single improvement over single- -frame baseline frame baseline – Time for the community to move on to Time for the community to move on to video video analysis analysis http://www.MediaMill.nl 1

  2. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Community myths or facts? Community myths or facts? • Chua et al., Chua et al., ACM Multimedia 2007 ACM Multimedia 2007 – Video search is practically solved and progress Video search is practically solved and progress has only been incremental has only been incremental • Yang and Hauptmann, Yang and Hauptmann, ACM CIVR 2008 ACM CIVR 2008 – Current solutions are weak and generalize poorly Current solutions are weak and generalize poorly We have done an experiment We have done an experiment • Two video search engines from 2006 and 2009 Two video search engines from 2006 and 2009 – MediaMill MediaMill Challenge 2006 system Challenge 2006 system – MediaMill MediaMill TRECVID 2009 system TRECVID 2009 system • How well do they detect 36 LSCOM concepts? How well do they detect 36 LSCOM concepts? http://www.MediaMill.nl 2

  3. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Four video data set mixtures Four video data set mixtures TRECVID 2005 TRECVID 2007 • Training Training Documentary Broadcast video news Within domain Cross domain • Testing Testing Documentary Broadcast video news Snoek & Smeulders, IEEE Computer 2010 Performance doubled in just 3 years Performance doubled in just 3 years • 36 concept detectors – Even when using training Even when using training data of different origin data of different origin – Vocabulary still limited – Vocabulary still limited Vocabulary still limited Vocabulary still limited http://www.MediaMill.nl 3

  4. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Snoek et al, TRECVID 2008-2009 Van de Sande et al, PAMI 2010 Van Gemert et al, PAMI 2010 State State- -of of- -the the- -Art Art Software available for download at http://colordescriptors.com Snoek et al, TRECVID 2008-2009 Van de Sande et al, PAMI 2010 Van Gemert et al, PAMI 2010 State State- -of of- -the the- -Art Art Software available for download at http://colordescriptors.com GPU is 5 times faster than quad-core CPU Van de Sande et al, TMM 2011 • Unresolved bottleneck: kernel Unresolved bottleneck: kernel- -SVM SVM – # Support Vectors x Cost of kernel computation # Support Vectors x Cost of kernel computation O (feature dimension) http://www.MediaMill.nl 4

  5. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Our TRECVID 2010 focus Our TRECVID 2010 focus • Baseline: TRECVID 2009 system Baseline: TRECVID 2009 system – 6 extra 6 extra i i- -frames per shot ~ 600K frames in test set frames per shot ~ 600K frames in test set • Revisit multi Revisit multi- -frame for Internet video frame for Internet video • Training from multiple domains Training from multiple domains – Add 50K labels from TRECVID05 Add 50K labels from TRECVID05- -09 ~ 170K frames train set 09 ~ 170K frames train set – Requires efficient prediction Requires efficient prediction Maji et al., CVPR 2008 is efficient is efficient For the Intersection Kernel h i is i piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red plot. Saves time & space! Slide credit: Subhransu Maji http://www.MediaMill.nl 5

  6. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Experiment 1: Experiment 1: Avg Avg vs vs Max Max ( χ ² χ ²) ) Max multi-frame appears best choice for online video = Emphasis added Moving object appearance Moving object appearance 1 Probability 0 Shot boundary keyframe Shot boundary http://www.MediaMill.nl 6

  7. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 = Emphasis added Moving object appearance Moving object appearance Max 1 Probability Avg 0 Shot boundary keyframe Shot boundary Note: we submitted avg… χ ² χ ² Experiment 2: Experiment 2: vs vs HIK HIK (max) (max) HIK 75 times faster, negligible loss in average precision http://www.MediaMill.nl 7

  8. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Top 21 results for “hand” Top 21 results for “hand” Top 21 results for “protest” Top 21 results for “protest” http://www.MediaMill.nl 8

  9. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Experiment 3: adding labels Experiment 3: adding labels At best on par, often worse. Top 21 results for “hand” Top 21 results for “hand” http://www.MediaMill.nl 9

  10. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Top 21 results for “protest” Top 21 results for “protest” TRECVID 2010 results TRECVID 2010 results MediaMill not submitted MediaMill submitted 97 other methods • When considering submitted runs only When considering submitted runs only – Best performer for 6 concepts Best performer for 6 concepts – Best overall Best overall http://www.MediaMill.nl 10

  11. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 Conclusions TRECVID 2010 Conclusions TRECVID 2010 • Internet video concept detection is feasible Internet video concept detection is feasible – Use max for effective multi Use max for effective multi- -frame fusion frame fusion – Use histogram intersection kernel for fast prediction Use histogram intersection kernel for fast prediction • We do not know how to exploit extra labeled We do not know how to exploit extra labeled training samples from other domains training samples from other domains – A good challenge! A good challenge! Contact info Contact info • Cees Snoek Cees Snoek http://staff.science.uva.nl/~ cgmsnoek http://staff.science.uva.nl/~ cgmsnoek • We are hiring! We are hiring! We are hiring! We are hiring! – PhD’s and PhD’s and Postoc Postoc on video event retrieval on video event retrieval http://www.MediaMill.nl 11

  12. MediaMill TRECVID 2010 18 ‐ 11 ‐ 2010 http://www.mediamill.nl References References The MediaMill TRECVID 2008-2010 Semantic Video Search Engine. C.G.M. Snoek et al. Proceedings of the TRECVID Workshop. Evaluating Color Descriptors for Object and Scene Recognition. K.E.A. van de Sande, Th. Gevers, C.G.M. Snoek. IEEE Trans. Pattern Analysis and Machine Intelligence, 2010. On the Surplus Value of Semantic Video Analysis Beyond the Key Frame. C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, and F.J. Seinstra. Proc. IEEE Int’l Conference on Multimedia & Expo, 2005. Empowering Visual Categorization with the GPU. K. E. A. van de Sande, T. Gevers, and C.G.M. Snoek. IEEE Trans. Multimedia, 2011. Classification using Intersection Kernel Support Vector Machines is Efficient Classification using Intersection Kernel Support Vector Machines is Efficient. S. Maji, A.C. Berg and J. Malik. Proc. IEEE CVPR, 2008. Concept-Based Video Retrieval. C.G.M. Snoek, M. Worring. Foundations and Trends in Information Retrieval, Vol. 4 (2), page 215-322, 2009. Visual-Concept Search Solved? C.G.M. Snoek, A.W.M. Smeulders. IEEE Computer, vol. 43(6), page. 76-78, 2010. http://www.MediaMill.nl 12

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend