CDVP & TRECVID-2003 News Story Segmentation Task Csaba Czirjek, - - PowerPoint PPT Presentation

cdvp trecvid 2003
SMART_READER_LITE
LIVE PREVIEW

CDVP & TRECVID-2003 News Story Segmentation Task Csaba Czirjek, - - PowerPoint PPT Presentation

Center for Digital Video Processing C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g CDVP &


slide-1
SLIDE 1

TREC-2003 (Neil O’Hare)

  • 1 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

CDVP & TRECVID-2003

News Story Segmentation Task

Csaba Czirjek, Gareth J.F. Jones, Seán Marlow, Noel Murphy, Noel

  • E. O’Connor, Neil O’Hare, Alan F.

Smeaton

slide-2
SLIDE 2

TREC-2003 (Neil O’Hare)

  • 2 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Contents

  • Introduction

– Structure of News Broadcast – System Overview

  • Story Segmentation System

– Feature Extraction Process – Combination of Features using Support Vector Machine – Submitted Runs

  • Results
  • Conclusions
slide-3
SLIDE 3

TREC-2003 (Neil O’Hare)

  • 3 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Structure of a News Broadcast

  • We assume stories are delimited by shots of the

anchorperson

  • Features of Anchor shots:

– All anchor shots within a broadcast taken from the same camera setup – filmed with a static camera, with little object motion – anchor shots in a single broadcast are visually similar to each other

slide-4
SLIDE 4

TREC-2003 (Neil O’Hare)

  • 4 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Structure of a News Broadcast

Anchorperson Shots News Report Shots Commercial Break

slide-5
SLIDE 5

TREC-2003 (Neil O’Hare)

  • 5 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

System Overview

  • We use TRECVID 2003 common shot boundary

provided by CLIPS-IMAG

  • Extracted features combined to detect anchor

shots

  • Story boundaries logged at the start of anchor

shots

  • Aim is to extract features that are robust to

changes across broadcasters (eg faces, motion, shot length)

  • This would give a generic news segmentation

system

slide-6
SLIDE 6

TREC-2003 (Neil O’Hare)

  • 6 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

System Overview

1 2 3 4 5 6 7 8

Shot Clustering Face Detection

Motion Activity Analysis x 2

Shot Length Text Segmentation

Support Vector Machine

News Story Detection News Stories

Donated by StreamSage Donated by CLIPS-IMAG

Shot Level Feature Extraction

30 Minute News Program

Shot Boundary Detection

slide-7
SLIDE 7

TREC-2003 (Neil O’Hare)

  • 7 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

  • Shots are clustered based on visual similarity

(colour histogram)

  • anchor shots grouped together
  • anchor clusters identified using heuristics:

– tend to be dispersed throughout the broadcast – average length longer than others – anchor shots are very similar to each other: they form ‘tighter’ clusters

Feature Extraction 1 - Shot Clustering

slide-8
SLIDE 8

TREC-2003 (Neil O’Hare)

  • 8 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Feature Extraction 2 - Face Detection

  • Coarse to fine approach to extract candidate

regions:

– Skin like pixels identified based on colour – Morphological filtering used to obtain smoothed areas of connected pixels – Shape and size heuristics remove candidate face regions

  • Candidates passed to a Principle Component

Analysis (PCA) module for final classification

  • Every 12th frame (I-frames) used for

processing

slide-9
SLIDE 9

TREC-2003 (Neil O’Hare)

  • 9 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Face Detection

0. 7 0. 5 0. 8 0. 2

Original video file For every 12th frame Filtered image after morphological adjustment Image after applying size/shape heuristics Detected faces with confidence score skin filtering + morphological adjustment size/shape heuristics Face Database PCA

slide-10
SLIDE 10

TREC-2003 (Neil O’Hare)

  • 10 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Feature Extraction 3 - Activity Measure

  • Motion Activity analysis based on MPEG-1

motion vectors

  • Every P-frame is analysed
  • We count the number of zero length motion

vectors in a P-frame (excluding I-blocks)

  • Activity measure:
  • No. of zero length vectors

Total No. of macroblocks

slide-11
SLIDE 11

TREC-2003 (Neil O’Hare)

  • 11 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

  • Two separate shot level measures used:

– least active P-frame is used to represent the shot – All motion vectors across a shot are added to form a cumulative motion vector. Activity measure then calculated using cumulative motion vector 0,-1 0,1

  • 3,5

0,0 0,0 4,3

  • 2,1

1,-1 1,0 0,1 1,0

  • 2,4

3,0 0,0 0,0

  • 2,1

0,1 0,1 0,0 1,1

  • 5,9

3,0 0,0 4,3

  • 4,2

1,0 1,1

+ = frame a frame b cumulative frame: frame a + frame b

Feature Extraction 3 - Activity Measure

slide-12
SLIDE 12

TREC-2003 (Neil O’Hare)

  • 12 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Feature Extraction 4 - Shot Length

  • Shot length used as a feature
  • Measured in frames
slide-13
SLIDE 13

TREC-2003 (Neil O’Hare)

  • 13 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Feature Extraction 5 - Text Analysis

  • To allow us to complete the required runs, we

used text analysis provided by StreamSage

  • StreamSage text output used as binary

feature

slide-14
SLIDE 14

TREC-2003 (Neil O’Hare)

  • 14 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Combination of Features - SVM

  • Extracted features combined using Support

Vector Machine

  • Trained on 10 hours of the TRECVID 2003

development set (5 CNN, 5 ABC)

  • Resulting SVM classifier detects anchor shots
  • Story boundaries are logged at the beginning
  • f anchor shots
slide-15
SLIDE 15

TREC-2003 (Neil O’Hare)

  • 15 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Submitted Runs

  • 3 Required Runs

– A/V only system - generic system for ABC and CNN

(DCU03_REQ_AV)

– A/V + text - generic system for ABC and CNN

(DCU03_REQ_AV_TEXT)

– Text only - text Analysis provided by StreamSage

(DCU03_REQ_TEXT_ONLY)

  • 2 Additional Optional Runs

– Specialised systems for ABC and CNN. Separate SVMs for each broadcaster (DCU03_OPT_AV) – Clustering algorithm in isolation (DCU03_OPT_CLUSTER)

slide-16
SLIDE 16

TREC-2003 (Neil O’Hare)

  • 16 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

DCU Results

System ID Recall Precision

DCU03_REQ_AV 0.328 0.409 DCU03_REQ_AV_TEXT 0.294 0.453 DCU03_REQ_TEXT_ONLY 0.049 0.208 DCU03_OPT_AV 0.313 0.453 DCU03_OPT_CLUSTER 0.364 0.304

slide-17
SLIDE 17

TREC-2003 (Neil O’Hare)

  • 17 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Overall Results - All Groups

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Recall Precision

DCU Fudan IBM kddi NUS StreamSage UCF Iowa

slide-18
SLIDE 18

TREC-2003 (Neil O’Hare)

  • 18 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Conclusions

  • Best results from specialised system

(DCU03_OPT_AV)

  • generic system not far behind
  • Extracted features robust across

broadcasters

  • Combined results improve precision

with small loss in recall compared to clustering alone

slide-19
SLIDE 19

TREC-2003 (Neil O’Hare)

  • 19 -

Center for Digital Video Processing

C e n t e r f o r D I g I t a l V I d e o P r o c e s s I n g

Thank You

Thank You