trecvid story segmentation based on content independent
play

TRECVID Story Segmentation based on Content-Independent Audio-Video - PowerPoint PPT Presentation

2004 TRECVID Workshop TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi, Masaru Sugano, Masaki Naito, Kazunori Matsumoto, Fumiaki Sugaya, Yasuyuki Nakajima KDDI R&D Laboratories, Inc. KDDI


  1. 2004 TRECVID Workshop TRECVID Story Segmentation based on Content-Independent Audio-Video Features Keiichiro Hoashi, Masaru Sugano, Masaki Naito, Kazunori Matsumoto, Fumiaki Sugaya, Yasuyuki Nakajima KDDI R&D Laboratories, Inc. KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 1 (Nov 15, 2004)

  2. Outline � Introduction � System description � Baseline story segmentation method � SVM-based segmentation w/ low-level features � System components: � Section-specific segmentation � Anchor shot segmentation � Post-filtering � Experiment results � Conclusion KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 2 (Nov 15, 2004)

  3. Introduction � Motivation � Development of a generic story segmentation algorithm applicable to non-news video contents � Requirements � Utilize only low-level audio-video features which can be extracted from any video data � Restricted use of news-specific features (e.g., anchor shots) � Restricted use of text information (e.g., ASR results) Main focus: Story segmentation based on “Audio+Video” experiment condition KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 3 (Nov 15, 2004)

  4. Introduction (cont’d) � However, content-specific features are necessary to achieve accurate segmentation Content-specific components developed to complement weak points of baseline method � Highly accurate story segmentation achieved! KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 4 (Nov 15, 2004)

  5. Overview: Experiment results 1.0 Recall Precision F-Measure 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 kddi_ss_all1_pfil kddi_ss_all1nsp07_pfil kddi_ss_all1 kddi_ss_c+k1 kddi_ss_all2nsp07_pfil kddi_ss_base A-1 A-2 B-1 B-2 B-3 E-1 kddi_ss_all2_pfil C-1 C-2 C-3 D-1 D-2 Figure 1. Recall, precision and F-measure of all “Audio+Video” TRECVID submissions Outperformed all non-KDDI runs! KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 5 (Nov 15, 2004)

  6. System Description KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 6 (Nov 15, 2004)

  7. System outline Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 7 (Nov 15, 2004)

  8. “Baseline” component Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 8 (Nov 15, 2004)

  9. Baseline story segmentation � Procedures: Input � Shot segmentation video � Merged TRECVID common shot boundaries with shot segmentation results of IBM VideoAnnEx tool shot segmentation � Applied “curtain-type” wipe detection method � Feature extraction feature extraction � Extracts low-level audio-video features from each shot, and generates “shot vectors” � SVM-based story segmentation SVM-based story segmentation � Discriminates shots which contain story boundaries KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 9 (Nov 15, 2004)

  10. Extracted audio-video features � Audio � Color � Average RMS � Color layout of first, middle, and last frame (6*Y, 3*Cb, � Avg RMS of first n frames 3*Cr) � Frequency of audio class � Color layout distance (silence, speech, music, between first, middle and noise) last frames � Details in Reference [4] � Temporal � Motion � Shot duration � Horizontal motion � Shot density � Vertical motion � Total motion Total number of elements: 51 � Motion intensity 51-dimensional “shot vector” KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 10 (Nov 15, 2004)

  11. SVM-based story segmentation � Apply SVM to discriminate shots w/ story boundary � Training phase � Shots which contain story boundary ⇒ Positive � All other shots ⇒ Negative t Story boundary Story boundary Story boundary � Evaluation phase � Extract N shots based on distance from SVM hyperplane � N = Average number of stories in ABC, CNN (Baseline) � N = Average number of stories x 1.5 (Extended baseline) � Set story boundary at beginning of each extracted shot KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 11 (Nov 15, 2004)

  12. Problems of baseline method � Although baseline results were satisfactory, several weak points were observed… � Poor recall in various “sections” � e.g., Top Stories , Headline Sports of CNN � Cause: Different characteristics compared to general content � No anchor shots, background music, etc. � SVM unable to adapt to various features � Impossible to detect multiple story boundaries that occur within a single shot � Baseline can only set one story boundary per shot KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 12 (Nov 15, 2004)

  13. Additional system components � Section-specialized segmentation � Objective: � Improvement of recall in specific sections which have different characteristics � Anchor shot segmentation � Objective: � Detection of multiple story boundaries which occur within a single shot � Post-filter � Objective: � Improvement of precision KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 13 (Nov 15, 2004)

  14. Component 1: Section-specialized segmentation Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 14 (Nov 15, 2004)

  15. Section-specialized segmentation � General approach: � Construct SVM specialized for story segmentation within specified sections � Procedures: � Section extraction � Extraction based on “jingles”, i.e., audio- section extraction video sequences which initiate sections � Section-specialized SVM � Construct SVM specialized to conduct story section-specialized SVM segmentation on extracted sections KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 15 (Nov 15, 2004)

  16. Section extraction � Automatic detection of “jingles” based on reference audio signals � Based on “Time-series active search” algorithm [Kashino] � Extract sections based on position of extracted jingles Top Stories Headline Sports t Start: Top Stories Start: Dollars and Sense Start: Headline Sports End: Headline Sports � Apply section-specialized SVM to set story boundaries within each extracted section KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 16 (Nov 15, 2004)

  17. Component 2: Anchor shot segmentation Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 17 (Nov 15, 2004)

  18. Anchor shot segmentation � General approach: � Extract shots which are expected to contain multiple stories (anchor shots), and insert additional boundaries � Procedures: anchor shot � Anchor shot extraction extraction � Construct SVM to discriminate anchor shots based on audio-video features anchor shot � Extraction of “silent sections” segmentation based on “silence” � Two methods: • Audio classification results • HMM-based non-speech detector story boundary � Story boundary addition addition � Insert story boundaries at detected silence sections KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 18 (Nov 15, 2004)

  19. Component 3: Post-filter Baseline Anchor shot Input segmentation video anchor shot Section-specialized shot segmentation extraction segmentation anchor shot section extraction segmentation feature extraction based on “silence” Post-filter Filter candidates SVM-based section-specialized story boundary w/o silent segments SVM addition story segmentation and anchor shots KDDI R&D Laboratories, Inc. TRECVID 2004 Presentation Slides 19 (Nov 15, 2004)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend