adaptive feature discovery for trecvid broadcast news
play

Adaptive Feature Discovery for TRECVID Broadcast News Video Story - PowerPoint PPT Presentation

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop 2004, Nov. 15-16 1 , Lyndon Kennedy 1 , Shih-Fu Chang 1 , Winston Hsu 3 , John Smith 2 , Giridharan Iyengar 3 Martin Franz 1 Dept. of Electrical


  1. Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop 2004, Nov. 15-16 1 , Lyndon Kennedy 1 , Shih-Fu Chang 1 , Winston Hsu 3 , John Smith 2 , Giridharan Iyengar 3 Martin Franz 1 Dept. of Electrical Engineering, Columbia University, New York, NY 2 IBM T. J. Watson Research Center, Hawthorne, NY 3 IBM T. J. Watson Research Center, Yorktown Heights, NY http://www.ee.columbia.edu/~winston digital video | multimedia lab - Winston H.-M. Hsu -

  2. -2- trecvid workshop, 11/15/2004 Outlines � Features and Fusion Strategies � Multi-modal features at different observation windows (e.g., prosody, visual cues, text) � Fusion with Support Vector Machines � New focus in 2004: � Automatic Visual Cue Cluster Construction (VC 3 framework) � Ability to handle diverse production events � Thorough error analysis for different genres � Brief comparison with last year results digital video | multimedia lab

  3. -4- trecvid workshop, 11/15/2004 Story Segmentation Model Determine the candidate points � union of pauses and shot boundaries with fuzzy window 2.5 sec � digital video | multimedia lab

  4. -5- trecvid workshop, 11/15/2004 Story Segmentation Model Determine the candidate points � union of pauses and shot boundaries with fuzzy window 2.5 sec � Extract and aggregate relevant features from surrounding windows � take into account asynchronous multi-modal futures; e.g., text, audio � digital video | multimedia lab

  5. -6- trecvid workshop, 11/15/2004 Story Segmentation Model ? Post-processing Determine the candidate points � union of pauses and shot boundaries with fuzzy window 2.5 sec � Extract and aggregate relevant features from surrounding windows � take into account asynchronous multi-modal futures; e.g., text, audio � Classify the candidate points as “boundary” or “non-boundary” � SVMs with RBF kernels � Post-processing � digital video | multimedia lab

  6. -7- trecvid workshop, 11/15/2004 Raw Multi-Modal Features Modality Raw Features Dim. Visual Visual Cues Clusters 15~40 2 commercial 2 motion Audio pause 1 prosody features 30 speaker change 1 * before taking into account speech rapidity 1 different observation windows Text text story seg. scores 1 digital video | multimedia lab

  7. -8- trecvid workshop, 11/15/2004 Visual Cue Cluster Construction (VC 3 ) � Motivation � News channels usually have different visual production events across channels or time and are statistically relevant to story boundaries � Usually try different ways to manually enumerate all the production events from inspections, and then train the classifiers � e.g. ANCHOR, STUDIO, WEATHER, CNN_HEADLINE, …, etc. � Problems -> deploying on multiple channels of multiple countries … � We hope to discover a systematic work to catch “visual cue clusters” � Analogously, text -> cue words or cue word clusters � Automatically, rather than by human inspection � Avoid time-consuming news production annotations via Information Bottleneck Clustering! digital video | multimedia lab

  8. -9- trecvid workshop, 11/15/2004 VC 3 : the Information Bottleneck Principle � Cluster to but still trying to preserve the mutual information with label space � If , a hard partitioning; we only care about maximizing ; that’s to minimize digital video | multimedia lab

  9. -10- trecvid workshop, 11/15/2004 VC 3 Overview: a Simple Example digital video | multimedia lab

  10. -11- trecvid workshop, 11/15/2004 VC 3 Overview: a Simple Example c 1 c 2 c 3 c 3 c 2 c 1 •Items (features) in the same cluster tend to be with similar probability distributions over the event labels Y ->semantic consistency!! •MI contributions from different clusters -> feature selection digital video | multimedia lab

  11. -12- trecvid workshop, 11/15/2004 VC 3 Overview: Joint Probability Approximation � For IB clustering, we essentially need � However, video features are not discrete but continuous! � Approximate joint probability via kernel density estimation from existent feature observations Gaussian Kernel with specific kernel bandwidth observed event probability conditioning on the feature � Embed prior knowledge on kernels functions and the kernel bandwidth ( D -dimensional) � Gaussian Kernel (diagonal): � Raw features: autocorrelogram, color moments, and Gabor texture digital video | multimedia lab

  12. -13- trecvid workshop, 11/15/2004 VC 3 Overview: Cluster Examples-I � ABC VCs for story seg. cluster selection/feature reduction!! digital video | multimedia lab

  13. -14- trecvid workshop, 11/15/2004 VC 3 Overview: Cluster Examples-II � CNN VCs for story seg. digital video | multimedia lab

  14. -15- trecvid workshop, 11/15/2004 VC 3 Overview: Cluster Examples-III � CNN VCs for text association TEMPERATURE, SHOWER, RAIN, THUDERSTORM, PRESSURE, … POINT, WIN, PLAY, MICHAEL, GAME, … POINT, DOLLAR, PERCENT, WORLD, DOW, NASDAQ, STREET SPORT, HEADLINE, JAMES, GAMES, … PRESIDENT, CLINTON, WHITE, DOLLAR, LEWINSKY, HOUSE, … digital video | multimedia lab

  15. -16- trecvid workshop, 11/15/2004 VC 3 Overview: Feature Projection � In feature extraction, project an image to those induced cue clusters by calculating the membership probabilities K -dim. VC Features digital video | multimedia lab

  16. -17- trecvid workshop, 11/15/2004 Performance Overview (A+V, Validation Set) A+V CNN A+V ABC digital video | multimedia lab

  17. -18- trecvid workshop, 11/15/2004 Performance Overview (A+V, Validation Set) 35.0 32.0 30.4 30.2 29.4 Ratio (Overall) 30.0 ME 25.0 VCs 21.3 A+V 20.0 15.0 15.0 12.9 8.8 10.0 7.1 7.6 7.6 8.2 7.9 6.2 7.0 5.8 6.9 6.3 6.3 6.1 5.4 5.9 3.7 3.2 2.9 4.6 2.5 5.0 2.6 2.1 2.2 2.1 2.0 2.0 0.9 0.1 0.3 0.0 cont. shrt anch. led 2nd anch. in anch. sprt bref. sprt->comm msc/anim prev->comm weather bref. • Annotate 749 stories into 9 types from 22 CNN videos ::story types • Fixed 0.71 precision; VC(*) evaluated at shot boundaries ONLY digital video | multimedia lab

  18. -19- trecvid workshop, 11/15/2004 Performance Overview ( A+V+T, Validation Set ) Revised A+V+T Fusion approach Over-fitting in the training set!! V >> V A >>> A >>> A T T : SVM fusion digital video | multimedia lab

  19. -20- trecvid workshop, 11/15/2004 TRECIV04 Test 04 Result TRECVID 2004 Story Segmentation NIST Submission 10 Columbia_IBM submissions 0.80 0.69 0.65 0.70 0.61 0.57 0.60 0.50 F1 0.40 0.30 0.20 0.10 0.00 dT AV_efc+efc AV_efc+ec AV_fc+fc AVmT AVmT_fc+fc AVdT_fc+c AVdT_fc+fc AVmT_fc_c mT best_of_others Significant degradation (10%) comparing with our two validation sets (A+V, � A+V+T: 0.72+) Probably due to that (1) visual patterns or raw feature had changed a lot in � the test set; (2) the fusion strategy; (3) the selection of decision threshold digital video | multimedia lab

  20. -21- trecvid workshop, 11/15/2004 Summary � Develop a novel information-theoretical framework to � discover visual cue clusters automatically � adapt to diverse production events of different channel � avoid manual specification/annotation of salient visual cues � Results confirm the effectiveness of VCs in the validation set � But the performance degrades in the test set due to time gap � Multi-modal fusion � Fusion of A and V has significant improvement � Fusion of AV and T improves performance in ABC only � Strategies for fusion are critical – simultaneous fusion is better � Major remaining errors � Short sports briefings � Suggest merging them to a continuous story in the ground truth digital video | multimedia lab

  21. -22- trecvid workshop, 11/15/2004 < the end; thanks > digital video | multimedia lab

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend