Adaptive Feature Discovery for TRECVID Broadcast News Video Story - PowerPoint PPT Presentation

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop 2004, Nov. 15-16 1 , Lyndon Kennedy 1 , Shih-Fu Chang 1 , Winston Hsu 3 , John Smith 2 , Giridharan Iyengar 3 Martin Franz 1 Dept. of Electrical Engineering, Columbia University, New York, NY 2 IBM T. J. Watson Research Center, Hawthorne, NY 3 IBM T. J. Watson Research Center, Yorktown Heights, NY http://www.ee.columbia.edu/~winston digital video | multimedia lab - Winston H.-M. Hsu -

-2- trecvid workshop, 11/15/2004 Outlines � Features and Fusion Strategies � Multi-modal features at different observation windows (e.g., prosody, visual cues, text) � Fusion with Support Vector Machines � New focus in 2004: � Automatic Visual Cue Cluster Construction (VC 3 framework) � Ability to handle diverse production events � Thorough error analysis for different genres � Brief comparison with last year results digital video | multimedia lab

-4- trecvid workshop, 11/15/2004 Story Segmentation Model Determine the candidate points � union of pauses and shot boundaries with fuzzy window 2.5 sec � digital video | multimedia lab

-5- trecvid workshop, 11/15/2004 Story Segmentation Model Determine the candidate points � union of pauses and shot boundaries with fuzzy window 2.5 sec � Extract and aggregate relevant features from surrounding windows � take into account asynchronous multi-modal futures; e.g., text, audio � digital video | multimedia lab

-6- trecvid workshop, 11/15/2004 Story Segmentation Model ? Post-processing Determine the candidate points � union of pauses and shot boundaries with fuzzy window 2.5 sec � Extract and aggregate relevant features from surrounding windows � take into account asynchronous multi-modal futures; e.g., text, audio � Classify the candidate points as “boundary” or “non-boundary” � SVMs with RBF kernels � Post-processing � digital video | multimedia lab

-7- trecvid workshop, 11/15/2004 Raw Multi-Modal Features Modality Raw Features Dim. Visual Visual Cues Clusters 15~40 2 commercial 2 motion Audio pause 1 prosody features 30 speaker change 1 * before taking into account speech rapidity 1 different observation windows Text text story seg. scores 1 digital video | multimedia lab

-8- trecvid workshop, 11/15/2004 Visual Cue Cluster Construction (VC 3 ) � Motivation � News channels usually have different visual production events across channels or time and are statistically relevant to story boundaries � Usually try different ways to manually enumerate all the production events from inspections, and then train the classifiers � e.g. ANCHOR, STUDIO, WEATHER, CNN_HEADLINE, …, etc. � Problems -> deploying on multiple channels of multiple countries … � We hope to discover a systematic work to catch “visual cue clusters” � Analogously, text -> cue words or cue word clusters � Automatically, rather than by human inspection � Avoid time-consuming news production annotations via Information Bottleneck Clustering! digital video | multimedia lab

-9- trecvid workshop, 11/15/2004 VC 3 : the Information Bottleneck Principle � Cluster to but still trying to preserve the mutual information with label space � If , a hard partitioning; we only care about maximizing ; that’s to minimize digital video | multimedia lab

-10- trecvid workshop, 11/15/2004 VC 3 Overview: a Simple Example digital video | multimedia lab

-11- trecvid workshop, 11/15/2004 VC 3 Overview: a Simple Example c 1 c 2 c 3 c 3 c 2 c 1 •Items (features) in the same cluster tend to be with similar probability distributions over the event labels Y ->semantic consistency!! •MI contributions from different clusters -> feature selection digital video | multimedia lab

-12- trecvid workshop, 11/15/2004 VC 3 Overview: Joint Probability Approximation � For IB clustering, we essentially need � However, video features are not discrete but continuous! � Approximate joint probability via kernel density estimation from existent feature observations Gaussian Kernel with specific kernel bandwidth observed event probability conditioning on the feature � Embed prior knowledge on kernels functions and the kernel bandwidth ( D -dimensional) � Gaussian Kernel (diagonal): � Raw features: autocorrelogram, color moments, and Gabor texture digital video | multimedia lab

-13- trecvid workshop, 11/15/2004 VC 3 Overview: Cluster Examples-I � ABC VCs for story seg. cluster selection/feature reduction!! digital video | multimedia lab

-14- trecvid workshop, 11/15/2004 VC 3 Overview: Cluster Examples-II � CNN VCs for story seg. digital video | multimedia lab

-15- trecvid workshop, 11/15/2004 VC 3 Overview: Cluster Examples-III � CNN VCs for text association TEMPERATURE, SHOWER, RAIN, THUDERSTORM, PRESSURE, … POINT, WIN, PLAY, MICHAEL, GAME, … POINT, DOLLAR, PERCENT, WORLD, DOW, NASDAQ, STREET SPORT, HEADLINE, JAMES, GAMES, … PRESIDENT, CLINTON, WHITE, DOLLAR, LEWINSKY, HOUSE, … digital video | multimedia lab

-16- trecvid workshop, 11/15/2004 VC 3 Overview: Feature Projection � In feature extraction, project an image to those induced cue clusters by calculating the membership probabilities K -dim. VC Features digital video | multimedia lab

-17- trecvid workshop, 11/15/2004 Performance Overview (A+V, Validation Set) A+V CNN A+V ABC digital video | multimedia lab

-18- trecvid workshop, 11/15/2004 Performance Overview (A+V, Validation Set) 35.0 32.0 30.4 30.2 29.4 Ratio (Overall) 30.0 ME 25.0 VCs 21.3 A+V 20.0 15.0 15.0 12.9 8.8 10.0 7.1 7.6 7.6 8.2 7.9 6.2 7.0 5.8 6.9 6.3 6.3 6.1 5.4 5.9 3.7 3.2 2.9 4.6 2.5 5.0 2.6 2.1 2.2 2.1 2.0 2.0 0.9 0.1 0.3 0.0 cont. shrt anch. led 2nd anch. in anch. sprt bref. sprt->comm msc/anim prev->comm weather bref. • Annotate 749 stories into 9 types from 22 CNN videos ::story types • Fixed 0.71 precision; VC(*) evaluated at shot boundaries ONLY digital video | multimedia lab

-19- trecvid workshop, 11/15/2004 Performance Overview ( A+V+T, Validation Set ) Revised A+V+T Fusion approach Over-fitting in the training set!! V >> V A >>> A >>> A T T : SVM fusion digital video | multimedia lab

-20- trecvid workshop, 11/15/2004 TRECIV04 Test 04 Result TRECVID 2004 Story Segmentation NIST Submission 10 Columbia_IBM submissions 0.80 0.69 0.65 0.70 0.61 0.57 0.60 0.50 F1 0.40 0.30 0.20 0.10 0.00 dT AV_efc+efc AV_efc+ec AV_fc+fc AVmT AVmT_fc+fc AVdT_fc+c AVdT_fc+fc AVmT_fc_c mT best_of_others Significant degradation (10%) comparing with our two validation sets (A+V, � A+V+T: 0.72+) Probably due to that (1) visual patterns or raw feature had changed a lot in � the test set; (2) the fusion strategy; (3) the selection of decision threshold digital video | multimedia lab

-21- trecvid workshop, 11/15/2004 Summary � Develop a novel information-theoretical framework to � discover visual cue clusters automatically � adapt to diverse production events of different channel � avoid manual specification/annotation of salient visual cues � Results confirm the effectiveness of VCs in the validation set � But the performance degrades in the test set due to time gap � Multi-modal fusion � Fusion of A and V has significant improvement � Fusion of AV and T improves performance in ABC only � Strategies for fusion are critical – simultaneous fusion is better � Major remaining errors � Short sports briefings � Suggest merging them to a continuous story in the ground truth digital video | multimedia lab

-22- trecvid workshop, 11/15/2004 < the end; thanks > digital video | multimedia lab

Adaptive Feature Discovery for TRECVID Broadcast News Video Story - PowerPoint PPT Presentation

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop 2004, Nov. 15-16 1 , Lyndon Kennedy 1 , Shih-Fu Chang 1 , Winston Hsu 3 , John Smith 2 , Giridharan Iyengar 3 Martin Franz 1 Dept. of Electrical

Broadcast Algorithms BJRN A. JOHNSSON Overview Best-Effort Broadcast (Regular) Reliable

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Broadcast Journalism: Guide for the Presentation of Radio and Television Broadcast Journalism:

Broadcast Encryption and Some Other Primitives Lecture 24 Broadcast Encryption Broadcast

BROADCAST RECEIVER SERVICE Broadcast receiver A broadcast receiver is a dormant component of

BROADCAST RECEIVER SERVICES Broadcast receiver A broadcast receiver is a dormant component of

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Broadcast Journalism: Guide for the Presentation of Broadcast Journalism: Guide for the

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Cooperative Broadcast for Cooperative Broadcast for Maximum Network Lifetime Maximum Network

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Meetings Research at ICSI Barbara Peskin reporting on work of: Don Baron, Sonali Bhagat, Hannah

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio

Some Cyclicity and Opacity Effects in the Prosody of Two Different Clitic Classes in New-

Critical Analysis & the Reading Process revised: 07.21.12 || English 1302: Composition II ||

LECTURE 25: PRESENTATIONS CSE 442 Software Engineering My Story My Story My Solution, Part

RUNNING FROM BEARS SOLVING YOUR PROBLEMS BY LETTING CHARACTERS SOLVE THEIRS 1 WHO AM I? 2

Retrofitting Purity with Comonads Neel Krishnaswami June 25, 2018 University of Cambridge Once

Adaptive Feature Discovery for TRECVID Broadcast News Video Story - PowerPoint PPT Presentation

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop 2004, Nov. 15-16 1 , Lyndon Kennedy 1 , Shih-Fu Chang 1 , Winston Hsu 3 , John Smith 2 , Giridharan Iyengar 3 Martin Franz 1 Dept. of Electrical

Broadcast Algorithms BJRN A. JOHNSSON Overview Best-Effort Broadcast (Regular) Reliable

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Broadcast Receiver Why do we need Broadcast Receiver? Broadcast Receivers Broadcast receiver

Decision Tree Prof. Seungchul Lee Industrial AI Lab. Feature Test Feature 1 Feature 2 Feature

Broadcast Journalism: Guide for the Presentation of Radio and Television Broadcast Journalism:

Broadcast Encryption and Some Other Primitives Lecture 24 Broadcast Encryption Broadcast

BROADCAST RECEIVER SERVICE Broadcast receiver A broadcast receiver is a dormant component of

BROADCAST RECEIVER SERVICES Broadcast receiver A broadcast receiver is a dormant component of

Neural Nets for Adaptive Filter and Adaptive Neural Nets as Adaptive Filters Pattern Recognition

Adaptive Control Chapter 1: Introduction to Adaptive Control Adaptive Control Landau, Lozano,

Adaptive Control Chapter 11: Direct Adaptive Control 1 Adaptive Control Landau, Lozano,

Broadcast Journalism: Guide for the Presentation of Broadcast Journalism: Guide for the

UNESCO Discovery Centre reference image of education space UNESCO Discovery Centre Discovery

Cooperative Broadcast for Cooperative Broadcast for Maximum Network Lifetime Maximum Network

A Distinctive Feature of A Distinctive Feature of A Distinctive Feature of A Distinctive Feature

Outline Reducing Dimensionality Feature Selection 1 Steven J Zeil Feature Extraction 2

Meetings Research at ICSI Barbara Peskin reporting on work of: Don Baron, Sonali Bhagat, Hannah

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &amp;

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN 6820 - Speech and Audio

Some Cyclicity and Opacity Effects in the Prosody of Two Different Clitic Classes in New-

Critical Analysis &amp; the Reading Process revised: 07.21.12 || English 1302: Composition II ||

LECTURE 25: PRESENTATIONS CSE 442 Software Engineering My Story My Story My Solution, Part

RUNNING FROM BEARS SOLVING YOUR PROBLEMS BY LETTING CHARACTERS SOLVE THEIRS 1 WHO AM I? 2

Retrofitting Purity with Comonads Neel Krishnaswami June 25, 2018 University of Cambridge Once

Implicit Prosodic Priming and Autistic Traits in Relative Clause Attachment Sun-Ah Jun &

Critical Analysis & the Reading Process revised: 07.21.12 || English 1302: Composition II ||