BBN VISER TRECVID MED 11 System
1/12/2012 1
BBN VISER TRECVID MED 11 System 1/12/2012 1 Outline Overview - - PowerPoint PPT Presentation
BBN VISER TRECVID MED 11 System 1/12/2012 1 Outline Overview Feature Extraction Low-level Features High-level Features: Objects and Concepts Automatic Speech Recognition (ASR) Features Videotext OCR Event Detection
1/12/2012 1
1/12/2012 3
1/12/2012 4
1/12/2012 6
Visual Words Audio Words
Bimodal BOW
Group1 Group2 Group3
Words Grouping
Audio BOW Visual BOW
1/12/2012 10
1/12/2012 11
i i
Projection value of one code-word’s coefficient Max Avg Normalized frequency
Spatial pyramid (1x1) Spatial pyramid (2x2) Spatial pyramid (1x3) Histograms, ColorSIFT, … Histograms, ColorSIFT, … Histograms, ColorSIFT, … Vector Quantization Vector Quantization Vector Quantization Video frame
Point Sampling Strategy Descriptor Computation BoW Representation
1/12/2012 16
Example of car detection in video frame Accumulate over time
1/12/2012 19
Audio track
… I'M MAKING A HEALTHY ALBACORE TUNA SANDWICH [UH] WITH NO MALE [UH] OR GOING_TO HAPPEN IS WE'RE GOING TO HAVE SOME SOLID WHITE ALBACORE TUNA …
Speech Activity Detection ASR Video Clip (Audio track) Speech segments ASR transcripts Speech
1/12/2012 20
sandwich: 4 tablespoon: 1 mayonnais: 1 … …
… I'M MAKING A HEALTHY ALBACORE TUNA SANDWICH [UH] WITH NO MALE [UH] OR GOING_TO HAPPEN IS WE'RE GOING TO HAVE SOME SOLID WHITE ALBACORE TUNA …
ASR transcripts Identified Keywords Normalized Keyword Histogram Extract Discriminant Keywords SVM (target vs non-target) P(target event |
1/12/2012 21
1/12/2012 22
video clips retrieved with high confidence Concurrence scores Measurement on event- dependent concurrent words Combining with
Max-pooling Event scores OCR output High-precision hypotheses Thresholding
1/12/2012 23
[turkey, sandwich] [bell, pepper] [butter, peanut] [fish] … … … …. can …fish … …. snadwich …. ... turky .... we can … take … … OCR output Predefined concurrent words for “making a sandwich” Concurrence scores are converted to OCR-based event score by max-pooling
1/12/2012 24
1/12/2012 25
Extracted Features Kernel-based Early Fusion Threshold Estimation
Joint Optimization
Sub-System 1 Sub-System 2 Sub-System N
System Combination Final Score
1/12/2012 27
1 n C c
1 1
N i i i i n
1/12/2012 30
1/12/2012 35
produces significant gains
Low-level Features BAYCOM Fusion
Fusion
PMD optimization
1/12/2012 38
significant gains
performance at a single point on the DET curve (detection threshold) and is sub-optimal at other points
improves performance over the entire DET curve
1/12/2012 39
1/12/2012 40
minimal – Most of the videos did not have any associated audio or text information for ASR or videotext OCR to work – Scene and object concepts were not helpful either
1/12/2012 41