Commercial Detection in Heterogeneous Video Streams Using Fused - PowerPoint PPT Presentation

Commercial Detection in Heterogeneous Video Streams Using Fused Multi-Modal and Temporal Features Masami Mizutani Fujitsu Labs. LTD. Shahram Ebadollahi Columbia University Shih-Fu Chang Columbia University IEEE ICASSP 2005 Philadelphia March 22, 2005

Outline � Motivation & Previous Work � Our Proposal Method � Approach � Local and Global Features for Commercial Detection � Fusion � Experiment & Result � Conclusion 2

Motivation � CM (commercial) detection � Find CM and PG (program) boundaries in broadcast material � Application: � CM skip capability on digital PVR, � Collecting CM for the marketing use, � Preprocess for further content analysis in PG, etc What’s the state of the art? 3

Previous Work � Dublin University Group (’01) [Marlow01] [Marlow01] � Heuristics to use blank and silence detectors � Philips Research (’03) [Dimitrova03] [Dimitrova03] � Use visual features (blank, scene change rate, text box location) from MPEG streams � Optimize the detection thresholds using Genetic Algorithm � Carnegie Melon University Group (’04) [Hauptmann04] [Hauptmann04] � Did not use blank feature, focus on color and audio � Identical CMs are broadcasted many times � Find repetitious video segments as CM candidates in video streams using SVMs in a hierarchical style 4

Previous Work (cont ’ d) � Reasonable performance, but test data limited and varied. � Blank is proven to be powerful, but not always present. � CMs are not repetitious in heterogeneous data set. � We build a systematic method to fuse diverse features including blank � We validate the results using a large diverse data set. Accuracy # Programs The amount of Fusion Method (F1 %) (# Genre) total data / CM DCU01 92 10 (a few?) 3.5h / 0.4h Heuristics Philips03 89 24 (6 genres) 12h / 2.5h Genetic Algorithm CMU04 91 10 (only news) 5h / 1.2h Hierarchical SVMs Our Method 92 49 (6 genres) 36h / 9h SVM + Duration HMM 5

Our Approach � Classification problem of detected scene change points � Scene change detector works well on CM/PG boundaries. (Mostly hard cut or fade in/out) � Use the pattern of multi-modal features in the local windows located at scene change points. � 15 sec window: half length of most CM clips � 120 sec window: for capturing the start/end of clips having blanks Scene Change PG CM PG 120 sec window 15 sec window Blank Overlay Text Audio(4bins) Color(12bins) Frame Scene Location (256bins) Rate (1bin) Change 16x16 … Rate (1bin) 1 2 3…1112 1 2 3 4 6

Our Approach (cont’d) � Use not only local features but global temporal feature � CM and PG are interleaved in each program � Density and locations of CMs in the entire program stream are dependent on genres and broadcast sources t t t t − + + 1 1 2 i i i i PG CM PG CM PG CM PG 4 L i k e l i h o o d (a) All genres 2 More quickly 0 t 0 0 . 5 1 1 . 5 in sports than 4 L i k e l i h o o d (b) Sports 2 in movie 0 t 0 0 . 5 1 1 . 5 L i k e l i h o o d 4 (c) Movie 2 0 t 0 0 . 5 1 1 . 5 7 Example of distributions of inter-arrival time of CM segments

Problem Formulation � Define two hidden states (CM, PG) at scene change points � Model them as Markov Chain with: � Duration feature : duration of stay at a state � Fused local features: observed content features at a state � Detection of CM/PG boundary � formulated as a problem of inferencing the optimal state sequence by Duration Viterbi algorithm Scene changes ( CM ) CM PG ( PG ) d d CM PG CM f f PG t f: Fused local features 8 d

Modeling Duration of Stay � Duration of PG: Erlang Mixture Model � Erlang is better for fitting positive samples. [ Vasconcelos 00] 00] � Mixture model is for fitting various genres. � The fitness is confirmed by Kolmogorov-Smirnov test � Duration of CM: a uniform distribution � The models are bounded by their max & min in training data. � Normalized actual duration of stay is considered. P Duration of CM Duration of PG 1/(max CM -min CM ) 0 min CM max CM 1 d min PG max PG 9 Now, let’s see feature extraction and fusion …

Feature Extraction: Scene Change, Blank and Overlay Text � Use a scene change (SC) detector [Zhong02] [Zhong02] and an simple blank frame (BF) detector � # of SCs in 15 sec and # of BFs in 120 sec Scene Change 120 sec. # of BFs ・・・・・・・・・ t Blank Frame # of SCs 15 sec. � Use overlay text location detector based on motion vector and texture energy [Zhang03] [Zhang03] 16(=352pix/22) � Detection results of every 16(=240pix/15) 5 frames are mapped onto a 2D grid (16x16 bins) � Location and frequency of overlay texts appearing in 15 sec. 256 bins 10

Feature Extraction: Audio & Color � Audio (4bins): use a HMM based classifier using MFCC � 1 sec of audio � {silence, speech, music, music/speech} � The counts of each class in 15 sec. Scene Change Count 15 sec. ・・・ t 1 sec. unit 1 2 3 4 � Color (12bins): use the histogram of the predetermined 12 pallet colors of shots in 15 sec. [Wei04] [Wei04] � The pallet color of each shot is determined based on 3 dominant colors of the keyframe. Scene Change The 12 pallet colors equally Count 15 sec. divides L*u*v space. ・・・・・・ t 1 2 3 1112 1 shot unit 11

Fuse Multi-Modal Features � Fuse into a single posterior probability in a late fusion style (2-step), due to the great diversity of the features � Use a local two-class (CM/PG) classifier for a modality � Find the posterior of CM using Bayes rule and sigmoid function [Plat99] [Plat99] � Another SVM fuses the posteriors and finds the final posterior of CM Overlay SC Rate (1bin) BF Rate (1bin) Audio (4bins) Color (12bins) Text (256bins) Classifier #1 Classifier #2 Classifier #3 Classifier #4 Classifier #5 (Poisson, ML) (Poisson, ML) (SVM w/ RBF) (SVM w/ RBF) (SVM w/ RBF) Bayes rule for ML 1 Classifier = ( | ) P CM o (SVM w/ RBF) ( | ) ( ) P o PG P PG Conversion to a posterior + 1 ( | ) ( ) P o CM P CM A fused feature Sigmoid function for SVM � Feed to Markov Chain 1 ≈ = ( | ) ( ) P CM o f x α + β + ( ) x 1 e

Experimental Data Set � Heterogeneous data set: � 49 programs from 6 US local/national channels � Including 6 genres: News, Drama, Animation, Entertainment, Movie, Sports � Totally 36 hrs including 9 hrs of commercials � Starts of CM and PG are labeled by manual � 3-Fold Cross Validation (training, validation, testing) CH(date) 6:00PM 6:30PM 7:00PM 7:30PM 8:00PM 8:30PM 9:00PM 9:30PM 10:00AM 10:30PM 11:00AM 11:30PM WB11 DRAMA DRAMA DRAMA DRAMA DRAMA DRAMA DRAMA DRAMA INFO DRAMA DRAMA (Fri. (SitCom) (SitCom) (SitCom) (SitCom) (SitCom) (SitCom) (SitCom) (SitCom) (D/N) (SitCom) (SitCom) 3/12/04) UPN9 DRAMA DRAMA DRAMA DRAMA MOVIE INFO DRAMA ENT (Sat. (SitCom) (SitCom) (SitCom) (SitCom) (D/N) (SitCom) (Gossip) 3/13/04) FOX5 INFO ANIME ANIME DRAMA ANIME DRAMA DRAMA DRAMA INFO DRAMA DRAMA (Sun. (D/N) (SitCom) (SitCom) (SitCom) (Daily New s, (SitCom) (SitCom) 3/14/04) Sports Nesw ) NBC INFO INFO INFO INFO DRAMA DRAMA DRAMA DRAMA DRAMA INFO ENT (Tue (D/N) (Politics/ (Others) (Others) (SitCom) (SitCom) (SitCom) (D/N) (Talk 3/16/04) National) Show ) 12:00PM 12:30PM 1:00PM 1:30PM 2:00PM 2:30PM 3:00PM 3:30PM 4:00PM 4:30PM 5:00AM 5:30PM ABC7 INFO ENT DRAMA DRAMA DRAMA ENT INFO (Mon. (D/N) (QUIZ) (Talk show) (D/N) 3/15/04) CBS2 IN SPORTS EVENT (Basketball Tournament) INFO (Thurs. FO (D/N) 3/18/04) 13

Performance Metric � F1 [D itrova03] for counting correctly classified [Dim imitrova03] boundaries � Each scene change point is a candidate, with label of positive (CM) or negative (PG). � Higher is better. But, can’t deal with short errors. = + 1 2 /( ) F PR P R = + /( ) … Recall R TP TP FN = + /( ) P TP TP FP … Precision PG CM PG Ground Truth t Detection Result PG CM PG t 14 TN FP TP FN TN

Performance Metric (cont ’ d) � WindowDiff [Pevzner02] [Pevzner02] to measure discrepancies between ground truth (ref.) and detection result (hyp.) � Widely used for text segmentation. � Lower is better. − N k 1 ∑ = − > ( , ) (| ( , ) ( , ) | 0 ) WD ref hyp b ref ref b hyp hyp + + − i i k i i k N k = 1 i : # of shots in the entire stream, N k : avg. number of shots in PG and CM segments ( , ) b i j : # of PG and CM boundaries btw position i and j N Ref A scene change shot Hyp 15 i + PG/CM boundary i k

Commercial Detection in Heterogeneous Video Streams Using Fused - PowerPoint PPT Presentation

Commercial Detection in Heterogeneous Video Streams Using Fused Multi-Modal and Temporal Features Masami Mizutani Fujitsu Labs. LTD. Shahram Ebadollahi Columbia University Shih-Fu Chang Columbia University IEEE ICASSP 2005 Philadelphia

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Coverage in Heterogeneous Coverage in Heterogeneous Networks Xiaoli Chu King s College

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

GPU-ACCELERATED VIDEO FRAME SEARCH ON VIDEO STREAMS HALL ENVER SOYLU SOFTWARE DEVELOPMENT

1102 Commercial Drive Heritage Designation 1102 Commercial Drive The Florida Market at 1102

Catching Events in Video Streams Mohan M. Trivedi Computer Vision and Robotics Research

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Language-based Colorization of Scene Sketches Changqing Zou* 1,2 , Haoran Mo* 1 , Chengying Gao 1 ,

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

Priority Queues & Heaps CS16: Introduction to Data Structures & Algorithms Spring 2020

RMIT at the NTCIR-13 We Want Web Task Luke Gallagher with Joel Mackenzie, Rodger Benham,

Machine learning from a complexity point of view Artemy Kolchinsky SFI CSSS 2019 1

IoDDoS The Internet of Distributed Denial of Service Attacks A Case Study of the Mirai

L A S T E X P L O I T A T I O N cp / zet / f9a About Us Researchers from TeamT5 Core

Fanfiction, Canon, and Possible Worlds, Or, Why Academics Should Care About Fanfiction Dr. Sara

Commercial Detection in Heterogeneous Video Streams Using Fused - PowerPoint PPT Presentation

Commercial Detection in Heterogeneous Video Streams Using Fused Multi-Modal and Temporal Features Masami Mizutani Fujitsu Labs. LTD. Shahram Ebadollahi Columbia University Shih-Fu Chang Columbia University IEEE ICASSP 2005 Philadelphia

WITH C++ Prof. Amr Goneid AUC Part 9. Streams &amp; Files Prof. amr Goneid, AUC 1 Streams

Detection of neutral particles detection of neutrons detection of neutrinons detection of low

Stream Bank Stabilization in Open Space Streams in open space There are approximately 35

CSE 143 Streams as C++ Classes Streams are C++ classes Streams have lots of built-in

Coverage in Heterogeneous Coverage in Heterogeneous Networks Xiaoli Chu King s College

Comparing Data Streams Using Hamming Norms Graham Cormode, Mayur Datar, Piotr Indyk, S.

Video Games Written and Researched by: Patrick Kania First Video Game The first Video Game made

Streams and File I/O Fundamentals of Computer Science Outline Overview of Streams and File

Data Streams Many large sources of data are generated as streams of updates: IP Network

Data Streams Many large sources of data are generated as streams of updates: IP Network

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data &amp; Real Time Data Streams

Unifying Heterogeneous Cray Unifying Heterogeneous Cray Resources and Systems into an

GPU-ACCELERATED VIDEO FRAME SEARCH ON VIDEO STREAMS HALL ENVER SOYLU SOFTWARE DEVELOPMENT

1102 Commercial Drive Heritage Designation 1102 Commercial Drive The Florida Market at 1102

Catching Events in Video Streams Mohan M. Trivedi Computer Vision and Robotics Research

Low Level Low Level Low Level Low Level Detection of Detection of Detection of Detection of

Language-based Colorization of Scene Sketches Changqing Zou* 1,2 , Haoran Mo* 1 , Chengying Gao 1 ,

Preprocessing and Dimensionality Reduction J er emy Fix CentraleSup elec

Priority Queues &amp; Heaps CS16: Introduction to Data Structures &amp; Algorithms Spring 2020

RMIT at the NTCIR-13 We Want Web Task Luke Gallagher with Joel Mackenzie, Rodger Benham,

Machine learning from a complexity point of view Artemy Kolchinsky SFI CSSS 2019 1

IoDDoS The Internet of Distributed Denial of Service Attacks A Case Study of the Mirai

L A S T E X P L O I T A T I O N cp / zet / f9a About Us Researchers from TeamT5 Core

Fanfiction, Canon, and Possible Worlds, Or, Why Academics Should Care About Fanfiction Dr. Sara

WITH C++ Prof. Amr Goneid AUC Part 9. Streams & Files Prof. amr Goneid, AUC 1 Streams

Stream Algorithmics Albert Bifet March 2012 Data Streams Big Data & Real Time Data Streams

Priority Queues & Heaps CS16: Introduction to Data Structures & Algorithms Spring 2020