TRECVID Story Segmentation based on Content-Independent Audio-Video - PowerPoint PPT Presentation

2004 TRECVID Workshop TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi, Masaru Sugano, Masaki Naito, Kazunori Matsumoto, Fumiaki Sugaya, Yasuyuki Nakajima KDDI R&D Laboratories, Inc. KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 1 (Nov 15, 2004)

Outline � Introduction � System description � Baseline story segmentation method � SVM-based segmentation w/ low-level features � System components: � Section-specific segmentation � Anchor shot segmentation � Post-filtering � Experiment results � Conclusion KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 2 (Nov 15, 2004)

Introduction � Motivation � Development of a generic story segmentation algorithm applicable to non-news video contents � Requirements � Utilize only low-level audio-video features which can be extracted from any video data � Restricted use of news-specific features (e.g., anchor shots) � Restricted use of text information (e.g., ASR results) Main focus: Story segmentation based on “Audio+Video” experiment condition KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 3 (Nov 15, 2004)

Introduction (cont’d) � However, content-specific features are necessary to achieve accurate segmentation Content-specific components developed to complement weak points of baseline method � Highly accurate story segmentation achieved! KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 4 (Nov 15, 2004)

Overview: Experiment results 1.0 Recall Precision F-Measure 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 kddi_ss_all1_pfil kddi_ss_all1nsp07_pfil kddi_ss_all1 kddi_ss_c+k1 kddi_ss_all2nsp07_pfil kddi_ss_base A-1 A-2 B-1 B-2 B-3 E-1 kddi_ss_all2_pfil C-1 C-2 C-3 D-1 D-2 Figure 1. Recall, precision and F-measure of all “Audio+Video” TRECVID submissions Outperformed all non-KDDI runs! KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 5 (Nov 15, 2004)

System Description KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 6 (Nov 15, 2004)

System outline Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 7 (Nov 15, 2004)

“Baseline” component Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 8 (Nov 15, 2004)

Baseline story segmentation � Procedures: Input � Shot segmentation video � Merged TRECVID common shot boundaries with shot segmentation results of IBM VideoAnnEx tool shot segmentation � Applied “curtain-type” wipe detection method � Feature extraction feature extraction � Extracts low-level audio-video features from each shot, and generates “shot vectors” � SVM-based story segmentation SVM-based story segmentation � Discriminates shots which contain story boundaries KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 9 (Nov 15, 2004)

Extracted audio-video features � Audio � Color � Average RMS � Color layout of first, middle, and last frame (6*Y, 3*Cb, � Avg RMS of first n frames 3*Cr) � Frequency of audio class � Color layout distance (silence, speech, music, between first, middle and noise) last frames � Details in Reference [4] � Temporal � Motion � Shot duration � Horizontal motion � Shot density � Vertical motion � Total motion Total number of elements: 51 � Motion intensity 51-dimensional “shot vector” KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 10 (Nov 15, 2004)

SVM-based story segmentation � Apply SVM to discriminate shots w/ story boundary � Training phase � Shots which contain story boundary ⇒ Positive � All other shots ⇒ Negative t Story boundary Story boundary Story boundary � Evaluation phase � Extract N shots based on distance from SVM hyperplane � N = Average number of stories in ABC, CNN (Baseline) � N = Average number of stories x 1.5 (Extended baseline) � Set story boundary at beginning of each extracted shot KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 11 (Nov 15, 2004)

Problems of baseline method � Although baseline results were satisfactory, several weak points were observed… � Poor recall in various “sections” � e.g., Top Stories , Headline Sports of CNN � Cause: Different characteristics compared to general content � No anchor shots, background music, etc. � SVM unable to adapt to various features � Impossible to detect multiple story boundaries that occur within a single shot � Baseline can only set one story boundary per shot KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 12 (Nov 15, 2004)

Additional system components � Section-specialized segmentation � Objective: � Improvement of recall in specific sections which have different characteristics � Anchor shot segmentation � Objective: � Detection of multiple story boundaries which occur within a single shot � Post-filter � Objective: � Improvement of precision KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 13 (Nov 15, 2004)

Component 1: Section-specialized segmentation Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 14 (Nov 15, 2004)

Section-specialized segmentation � General approach: � Construct SVM specialized for story segmentation within specified sections � Procedures: � Section extraction � Extraction based on “jingles”, i.e., audio- section extraction video sequences which initiate sections � Section-specialized SVM � Construct SVM specialized to conduct story section-specialized SVM segmentation on extracted sections KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 15 (Nov 15, 2004)

Section extraction � Automatic detection of “jingles” based on reference audio signals � Based on “Time-series active search” algorithm [Kashino] � Extract sections based on position of extracted jingles Top Stories Headline Sports t Start: Top Stories Start: Dollars and Sense Start: Headline Sports End: Headline Sports � Apply section-specialized SVM to set story boundaries within each extracted section KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 16 (Nov 15, 2004)

Component 2: Anchor shot segmentation Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 17 (Nov 15, 2004)

Anchor shot segmentation � General approach: � Extract shots which are expected to contain multiple stories (anchor shots), and insert additional boundaries � Procedures: anchor shot � Anchor shot extraction extraction � Construct SVM to discriminate anchor shots based on audio-video features anchor shot � Extraction of “silent sections” segmentation based on “silence” � Two methods: • Audio classification results • HMM-based non-speech detector story boundary � Story boundary addition addition � Insert story boundaries at detected silence sections KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 18 (Nov 15, 2004)

Component 3: Post-filter Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 19 (Nov 15, 2004)

TRECVID Story Segmentation based on Content-Independent Audio-Video - PowerPoint PPT Presentation

2004 TRECVID Workshop TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi, Masaru Sugano, Masaki Naito, Kazunori Matsumoto, Fumiaki Sugaya, Yasuyuki Nakajima KDDI R&D Laboratories, Inc. KDDI

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

1 QC STORY -32 QC STORY -32 QC STORY -32 QC Story-1 QC Story-1 QC Story-1 Awards and

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

DXA studio 40 Greene Avenue October 17, 2017 GREENE AVENUE 4 STORY 4 STORY 4 STORY 4 STORY

Discovery and Fusion of Salient Multi-modal Features towards News Story Segmentation - @ TRECVID

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn What is a device

SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance

Introduction . Zhang embed U q ( sl 2 ) into Assume q is not a root of unity, X. W. Chen and P the

Distributed Systems Principles and Paradigms Chapter 09 (version April 7, 2008 ) Maarten van

Outline Introduction Motivation & related work Existing visualizers Proposed

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Better Keyboard Melvin Rudolph & Johannes Pfleghar Motivation Motivation Hard typing

Retrieving String inputs from the keyboard package PackageName ; // import the Scanner class

Sambuz

Useful Links

Newsletter

Mail Us

TRECVID Story Segmentation based on Content-Independent Audio-Video - PowerPoint PPT Presentation

2004 TRECVID Workshop TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi, Masaru Sugano, Masaki Naito, Kazunori Matsumoto, Fumiaki Sugaya, Yasuyuki Nakajima KDDI R&D Laboratories, Inc. KDDI

Adaptive Feature Discovery for TRECVID Broadcast News Video Story Segmentation @TRECVID Workshop

Segmentation Bottom-up Segmentation Semantic / instance segmentation Many Slides from L.

VIDEO SIGNALS Segmentation WHAT IS SEGMENTATION WHAT IS SEGMENTATION Segmentation is a

Semantic Segmentation / Instance Segmentation Based on Deep learning Yiding Liu 2018.12.08

1 QC STORY -32 QC STORY -32 QC STORY -32 QC Story-1 QC Story-1 QC Story-1 Awards and

Segmentation Segmentation Segmentation Define the accurate boundaries of all objects in an image

Segmentation using Segmentation using Bayesian Decision Theory Bayesian Decision Theory

DXA studio 40 Greene Avenue October 17, 2017 GREENE AVENUE 4 STORY 4 STORY 4 STORY 4 STORY

Discovery and Fusion of Salient Multi-modal Features towards News Story Segmentation - @ TRECVID

Learning From Video Browse Behavior Learning From Video Browse Behavior TRECVID 2009 TRECVID

George Awad National Institute of Standards and Technology Dakota Consulting, Inc 2 TRECVID

CMU @ TRECVID Event Detection @ Ming-yu Chen &amp; Alex Hauptmann School of Computer Science

Columbia HLF: TRECVID2006 TRECVID TRECVID TRECVID 2005 2005 2005 (development)

Event Detection in Airport Surveillance The TRECVid 2008 Evaluation The TRECVid 2008 Evaluation

TRECVID 2008 CBCD TRECVID 2008. CBCD MCG-ICT-CAS MCG-ICT-CAS Sheng Tang Yongdong Zhang Ke Gao

TRECVID 2010 K TRECVID 2010 Known item Search it S h by NUS by NUS Xiangyu Chen, Jin Yuan

Graphing Crumbling Cookies AdKDD 2019 Matt Malloy, Jon Koller and Aaron Cahn What is a device

SigniTrend: Scalable Detection of Emerging Topics in Textual Streams by Hashed Significance

Introduction . Zhang embed U q ( sl 2 ) into Assume q is not a root of unity, X. W. Chen and P the

Distributed Systems Principles and Paradigms Chapter 09 (version April 7, 2008 ) Maarten van

Outline Introduction Motivation &amp; related work Existing visualizers Proposed

Subspace Clustering Ensemble Clustering Subspace Clustering, Ensemble Clustering, Alternative

Better Keyboard Melvin Rudolph &amp; Johannes Pfleghar Motivation Motivation Hard typing

Retrieving String inputs from the keyboard package PackageName ; // import the Scanner class

Sambuz

Useful Links

Newsletter

Mail Us

CMU @ TRECVID Event Detection @ Ming-yu Chen & Alex Hauptmann School of Computer Science

Outline Introduction Motivation & related work Existing visualizers Proposed

Better Keyboard Melvin Rudolph & Johannes Pfleghar Motivation Motivation Hard typing