TRECVID-2006: Shot Boundary Detection Task Overview Alan Smeaton - - PowerPoint PPT Presentation

trecvid 2006 shot boundary detection task overview
SMART_READER_LITE
LIVE PREVIEW

TRECVID-2006: Shot Boundary Detection Task Overview Alan Smeaton - - PowerPoint PPT Presentation

TRECVID-2006: Shot Boundary Detection Task Overview Alan Smeaton Dublin City University & Paul Over NIST SB Task Definition Shot boundary detection is a fundamental task in any kind of video content manipulation Task provides


slide-1
SLIDE 1

TRECVID-2006: Shot Boundary Detection Task Overview

Alan Smeaton Dublin City University & Paul Over NIST

slide-2
SLIDE 2

TRECVID 2006 2

SB Task Definition

Shot boundary detection is a fundamental task in any kind of video content manipulation

Task provides a good entry for groups who wish to “break into” video retrieval and TRECVID gradually

Task is to identify the shot boundaries with their location and type (cut or gradual) in the given video clip(s)

slide-3
SLIDE 3

TRECVID 2006 3

SB Task Details

Groups may submit up to 10 runs

Comparison to human-annotated reference (thanks to Jonathan Lasko, again)

Groups were asked to provide some standard information

  • n the processing complexity of each run:

Total runtime in seconds

Total decode time in seconds

Total segmentation time in seconds

Processor description

slide-4
SLIDE 4

TRECVID 2006 4

Shot boundary task: Participating groups (26)

  • 1. AIIA Laboratory

Greece

  • 2. AT&T Laboratories

USA

  • 3. Chinese Academy of Sciences / JDL

China

  • 4. City University of Hong Kong

China

  • 5. CLIPS-IMAG, LSR-IMAG

France

  • 6. COST292

EU

  • 7. Curtin University

Australia

  • 8. Dokuz Eylol

Turkey

  • 9. Florida International University

USA

  • 10. FX Palo Alto Laboratory

USA

  • 11. Helsinki University of Technology

Finland

  • 12. Huazhong U. of Science & Tech.

China

  • 13. Indian Institute of Tecnology,

Bombay India

  • 14. IIT / NCSR Demokritis

Greece

  • 15. KDDI / Tokushima U. / ISM / NII Japan
  • 16. ETIS

Greece

  • 17. Motorola Research Lab.

USA

  • 18. RMIT University

Australia

  • 19. Tokyo Institute of Technology Japan
  • 20. Tsinghua University

China

  • 21. University of Marburg Germany
  • 22. University of Modena Reggio

Italy

  • 23. Carleton University (Ottawa)

Canada

  • 24. University of Sao Paulo (USP)

Brazil

  • 25. University Rey Juan Carlos

Spain

  • 26. Zhejiang University

China

2005 had 21 groups, of whom 9 appear again in 2006

slide-5
SLIDE 5

TRECVID 2006 5

Shot boundary data

13 representative news videos

Total frames: 597043

Total transitions: 3785

Transition types:

1,844 (48.7%) Cuts (2005: 60.8%)

1,509 (39.9%) Dissolves (2005:30.5%)

51 ( 1.3%) Fade-out/-in (2005: 1.8%)

381 (10.1%) other (2005: 6.9%)

More graduals, which are harder to match

slide-6
SLIDE 6

TRECVID 2006 6

Shot boundary data – more short graduals

Short graduals: graduals <= 5 frames in length

Harder to match - treated as “cuts” but no 5-frame expansion as with other cuts to handle differences in decoders

2006 data has more “short graduals”

% of all % of graduals Short graduals 2003 2004 2005 2006 2 10 14 24 7 24 35 47

slide-7
SLIDE 7

TRECVID 2006 7

Evaluation Measures

Precision = Recall = Frame Precision = Frame Recall =

# Transitions Correctly Reported # Transitions Reported # Transitions Correctly Reported # Transitions in Reference # Frames Correctly Reported in Detected Transitions # Frames reported in Detected Transitions # Frames Correctly Reported in Detected Transitions # Frames in Reference Data for Detected Transitions

slide-8
SLIDE 8

TRECVID 2006 8

Cuts

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

Recall Precision

AIIA ATT CityUHK CLIPS COST292 Curtin Carleton.UO DokuzEylulU ETIS FIU FXPAL Huazhong IIT.NCSR CAS.JDL KDDI.TU.TUT USaoPaolo Motorola Marburg HelsinkiUT RMIT Zhejiang Tsinghua TokyoInstTech UniMore URJC ITT.Bombay

slide-9
SLIDE 9

TRECVID 2006 9

Cuts (zoomed)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 .5 0 .6 0 .7 0 .8 0 .9 1

Recall Precision

AIIA ATT CityUHK CLIPS COST292 Curtin Carleton.UO DokuzEylulU ETIS FIU FXPAL Huazhong IIT.NCSR CAS.JDL KDDI.TU.TUT USaoPaolo Motorola Marburg HelsinkiUT RMIT Zhejiang Tsinghua TokyoInstTech UniMore URJC ITT.Bombay

slide-10
SLIDE 10

TRECVID 2006 10

Cuts (zoomed again)

0.75 0.8 0.85 0.9 0.95 1

0 .7 5 0 .8 5 0 .9 5

Recall Precision

AIIA ATT CityUHK CLIPS COST292 Curtin Carleton.UO DokuzEylulU ETIS FIU FXPAL Huazhong IIT.NCSR CAS.JDL KDDI.TU.TUT USaoPaolo Motorola Marburg HelsinkiUT RMIT Zhejiang Tsinghua TokyoInstTech UniMore URJC ITT.Bombay

slide-11
SLIDE 11

TRECVID 2006 11

Gradual transitions

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

Recall Precision

AIIA ATT CityUHK CLIPS COST292 Curtin Carleton.UO DokuzEylulU ETIS FIU FXPAL Huazhong IIT.NCSR CAS.JDL KDDI.TU.TUT USaoPaolo Motorola Marburg HelsinkiUT RMIT Zhejiang Tsinghua TokyoInstTech UniMore URJC ITT.Bombay

slide-12
SLIDE 12

TRECVID 2006 12

Gradual transitions (zoomed)

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 .5 0 .6 0 .7 0 .8 0 .9 1

Recall Precision

AIIA ATT CityUHK CLIPS COST292 Curtin Carleton.UO DokuzEylulU ETIS FIU FXPAL Huazhong IIT.NCSR CAS.JDL KDDI.TU.TUT USaoPaolo Motorola Marburg HelsinkiUT RMIT Zhejiang Tsinghua TokyoInstTech UniMore URJC ITT.Bombay

slide-13
SLIDE 13

TRECVID 2006 13

Gradual transitions (Frame-P & -R)

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

0 .1 0 .2 0 .3 0 .4 0 .5 0 .6 0 .7 0 .8 0 .9 1

Recall Precision

AIIA ATT CityUHK CLIPS COST292 Curtin Carleton.UO DokuzEylulU ETIS FIU FXPAL Huazhong IIT.NCSR CAS.JDL KDDI.TU.TUT USaoPaolo Motorola Marburg HelsinkiUT RMIT Zhejiang Tsinghua TokyoInstTech UniMore URJC ITT.Bombay

slide-14
SLIDE 14

TRECVID 2006 14

Gradual transitions (Frame-P & -R) zoomed

0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

0 .5 0 .6 0 .7 0 .8 0 .9 1

Recall Precision

AIIA ATT CityUHK CLIPS COST292 Curtin Carleton.UO DokuzEylulU ETIS FIU FXPAL Huazhong IIT.NCSR CAS.JDL KDDI.TU.TUT USaoPaolo Motorola Marburg HelsinkiUT RMIT Zhejiang Tsinghua TokyoInstTech UniMore URJC ITT.Bombay

slide-15
SLIDE 15

TRECVID 2006 15

20000 40000 60000 80000 100000 120000 140000 160000 180000 200000 220000 240000 260000 280000 300000 320000 340000 360000 380000 400000 420000

SirCy9 MR hust KDDI Chinese U HK COST292 ATT UNIMORE Tsinghua U. Motorola JDL USP RMIT REALTIME ====> FIU DEU IIT-Bombay CLIPS FXPAL CU-Uottawa AIIA TokyoTech URJC ETIS HUT Participant Mean runtim e ( s)

Mean runtime in seconds

slide-16
SLIDE 16

TRECVID 2006 16

5000 10000 15000 20000

S i r C y 9 M R h u s t K D D I C h i n e s e U H K C O S T 2 9 2 A T T U N I M O R E T s i n g h u a U . M

  • t
  • r
  • l

a J D L U S P R M I T R E A L T I M E Participant Mean runtim e ( s)

Mean runtime in seconds (faster than realtime)

slide-17
SLIDE 17

TRECVID 2006 17

Mean total runtime vs effectiveness on cuts

(for systems faster than realtime)

0.4 0.5 0.6 0.7 0.8 0.9 1

2000 4000 6000 8000 10000 12000 14000 16000 18000

Mean total runtime (seconds)

Average F1 (harmonic mean

  • f precision and recall)

ATT COST292 Huazhong JDL (CAS) KDDI.TU.TUT USaoPaolo Motorola Marburg RMIT Tsinghua TokyoInstTech UniMore

slide-18
SLIDE 18

TRECVID 2006 18

Mean total runtime vs effectiveness on graduals

(for systems faster than realtime)

0.4 0.5 0.6 0.7 0.8 0.9 1

2000 4000 6000 8000 10000 12000 14000 16000 18000

Mean total runtime (seconds)

Average F1 (harmonic mean

  • f precision and recall)

ATT COST292 Huazhong JDL (CAS) KDDI.TU.TUT USaoPaolo Motorola Marburg RMIT Tsinghua TokyoInstTech UniMore

slide-19
SLIDE 19

TRECVID 2006 19

  • 1. AIIA Laboratory

ICASSP2006 paper describes using information from multiple pairs of frames, within a temporal window;

Good for GTs, which it targets

10 runs, varying thresholds

Frame similarity is color based, not histogram bins but intensity

  • f R,G,B, window size

Downsampled frame size for 25%

Performance … several others do better for cuts and also for GTs, but in FR/FP they are better

Computational expense as expected, several xRT, but novel

slide-20
SLIDE 20

TRECVID 2006 20

  • 2. AT&T Laboratories

Built 6x independent detectors for cuts, fast dissolves (<5Fs), fade-in, fade-out, dissolve, and wipes;

Easy to plug in new detectors;

Fusion of outputs, fuse & resolve conflicts

Each detector is a FSM (details in paper)

Extract color RGB & intensity, histograms, edges, average, variance, skew, flatness, all from a central area of frame -> losing the borders;

Compute frame-frame for adjacent and 6-distant frames;

Late fusion with prioritisation of detection types;

7th fastest in execution and rates well in performance

slide-21
SLIDE 21

TRECVID 2006 21

  • 3. Chinese Academy of Sciences / JDL

2-pass approach … histograms and mutual information

Thresholding to locate possible SBs then a SVM on those candidate areas;

Rationale based on not needing detailed features around every frame;

Needs to improve distinction between GTs and camera motion, which gives false +s;

Histograms are color based

Results deflated by their decoder being 1 frame out of sync with evaluation numbering;

slide-22
SLIDE 22

TRECVID 2006 22

  • 4. City University of Hong Kong

Used RGB and HSV color spaces;

Euclidean distance, color moments and Earth Mover distances

EMD best

Used adaptive thresholding, adapting to mean and standard deviations in 11-frame window;

Good for cuts and short GTs;

Separate GT detector;

slide-23
SLIDE 23

TRECVID 2006 23

  • 5. CLIPS-IMAG

Same system as in 2004 and 2005, no new training.

Cut detection by image difference with motion compensation and photographic flash detection.

GTs by comparing norms of the first and second temporal derivatives of the images.

Performance worse than previous years;

slide-24
SLIDE 24

TRECVID 2006 24

  • 6. COST292

10 sites, 2 involved in SB task;

Used existing detectors from TU Delft and from LaBRI U. Bordeaux, merged outputs;

Delft … spatiotemporal block based analysis based on 3D pixel blocks, not frames or 2D blocks;

LaBRI is the 2005/2004 detector, improved;

Targets I- and P- frames only, in compressed domain

Merging based on intersections and then weighted confidences in each method;

Submitted both individual and combined runs … combined less than the best individual run;

slide-25
SLIDE 25

TRECVID 2006 25

  • 7. Curtin University

Late paper ?

slide-26
SLIDE 26

TRECVID 2006 26

  • 8. Dokuz Eylol U.

Color histograms, Euclidean distance, differences in RGB for frame-frame, with thresholds;

Used a skip frame interval to skip ahead 5 frames when very similar;

Big reduction in compute time, small loss in accuracy;

Effectiveness needs to be improved;

slide-27
SLIDE 27

TRECVID 2006 27

  • 9. Florida International University

No paper

slide-28
SLIDE 28

TRECVID 2006 28

  • 10. FX Palo Alto Laboratory

Builds on 2004 and 2005;

Low level features (global and block colour histograms), feed in to generate mid-level features (interframe similarity matrices), which feeds into a kNN classifier

Used more favorable training data than previous years “used machine generated output from master shot reference of the development set”

slide-29
SLIDE 29

TRECVID 2006 29

  • 11. Helsinki University of Technology

Approach is to extract feature vectors from consecutive frames;

Project these on to a 2D self-organizing map (SOM);

Detect GTs and cuts from resulting SOM;

Experimented with cut optimized, GT optimized, blend optimized and different training data sources;

Computationally the most expensive (because of SOMs);

slide-30
SLIDE 30

TRECVID 2006 30

  • 12. Huazhong U. of Science & Tech.

No paper

slide-31
SLIDE 31

TRECVID 2006 31

  • 13. Indian Institute of Tecnology, Bombay

Targets false +ves from dramatic illumination changes (flashes) and shaky camera and fire/explosions;

Multi-layer filtering to detect candidates based on correlation of intensity features;

Then use Morlet wavelets to filter candidates and a threshold SVM which uses more detailed features

Pixel differences, color histograms, edges, intensity & wavelets;

The best cuts-only and best GTs-only are competitive but the merged combination is not;

slide-32
SLIDE 32

TRECVID 2006 32

  • 14. IIT / NCSR Demokritis

Spatial Segmentation

Frame-frame similarities between consecutive frames using Earth Mover’s distance;

Combination of RGB color, adjacent RGB color, center of mass and adjacent gradients;

Independent modeling and detection of cuts and GTs;

Hard cuts OK, GTs weak -- plan to include motion information;

slide-33
SLIDE 33

TRECVID 2006 33

  • 15. KDDI / Tokushima U. / ISM / NII

Very fast execution time and among best performances;

Extension of 2005 approach and new detection of long dissolves;

2-stage SVMs with combination of multi-kernals

Features used are:

Number of in-edges, number of out-edges;

Pixel intensities;

FX-PAL 2004 approach;

Edge change ratio;

slide-34
SLIDE 34

TRECVID 2006 34

  • 16. ETIS

SVMs as standard trained classifiers;

Independent cut and GT detectors;

CUTS - features are color histograms, variations on moments for shape description, projection histograms,

GTs - features are illumination variations and global edge information;

Also includes a fade detector;

Trained on Brazilian TV commercials, only 2 min and 2 sec

  • f it ?

Computationally the most expensive;

slide-35
SLIDE 35

TRECVID 2006 35

  • 17. Motorola Multimedia Research Lab.

No paper

slide-36
SLIDE 36

TRECVID 2006 36

  • 18. RMIT University

Building on previous TRECVids

Based on a moving query window yet performance is approx real time;

Performance in 2006 is less than previous years, possibly because of harder data, especially on GTS.

HSV color bins for regions of the frame, with weightings for some regions;

slide-37
SLIDE 37

TRECVID 2006 37

  • 19. Tokyo Institute of Technology

No paper

slide-38
SLIDE 38

TRECVID 2006 38

  • 20. Tsinghua University

Same system as TRECVid 2005 but improved;

Ran 2006 system on 2005 data yielding better performance than 2005, so system better;

Yet 2006 figures are worse than 2005 figures --> data is officially harder

Improvements are in the detection of FOIs, flashes and short GTs;

Uses an FOI detector, independent CUT and GT detection, and targets the transitions in video-in-video, which are not SBs;

Possibly the best performance and again, very fast;

slide-39
SLIDE 39

TRECVID 2006 39

  • 21. University of Marburg

Unsupervised k-means clustering for Cuts and GTs, extending TRECVid2005 system;

Cuts …

2 different frame dissimilarity measures namely motion-compensated pixel differences and color based histograms

GTs …

Dissimilarities for different frame distances, same dissimilarity measures as

  • cuts. Explicit fade detector;

Good for cuts … execution performance ?

Unsupervised approach … “reached a level of robustness and detection quality … (especially) for cuts”

slide-40
SLIDE 40

TRECVID 2006 40

  • 22. University of Modena Reggio

Follows TRECVid in 2005 (with FSU)

Targets GTs which have linear frame transitions, but it also works for cuts;

Work on determining the range (in frames) and nature

  • f a GT and integrating Cut and GT detectors;

Works on windows of 60 frames;

Not clear what (frame) similarity is used;

Quite fast;

slide-41
SLIDE 41

TRECVID 2006 41

  • 23. Carleton University (Ottawa)

Approach based on tracking image features across frames, and if a lot of features drop off in the tracking, then likely shot bound;

Designed for non-news video … movies, TV, etc.

“features” are corners of edges on the greyscale frames;

Requires registration of corner features across frames;

Needs automatic thresholding to adjust to video type;

Inherently computationally very expensive, but includes some “tricks” to reduce time, but still 5x RT at least;

Very different;

slide-42
SLIDE 42

TRECVID 2006 42

  • 24. University of Sao Paulo (USP)

2-step process

  • Compute absolute pixel differences

between adjacent frames to detect ‘events’ … any type of large discontinuity or activity in pixels;

  • Histogram intersection difference on

candidate areas from (1);

Designed for cuts only;

slide-43
SLIDE 43

TRECVID 2006 43

  • 25. University Rey Juan Carlos

Builds on TRECVid 2005, fusing color and shape primitives;

Color == 16-bin histogram;

Shape == Zernike moments;

Varied the weighed combinations and found a fusion approach that improved on the independents in isolation;

Computation of Zernike moments can be expensive;

Interesting results of 2006 system on 2005 and 2006 data showed 2006 data much poorer performance;

slide-44
SLIDE 44

TRECVID 2006 44

  • 26. Zhejiang University

Fastest performance but some programming error in cuts, GTs are better

Paper doesn’t say they did SBD !

slide-45
SLIDE 45

TRECVID 2006 45

Observations

Excellent performance on cuts and graduals despite more difficult data

Good effectiveness achievable at significantly less than realtime

Despite the continued introduction of novel approaches, novelty =/= improvement

Interest in the task seems strong … but ..

Seems time to retire this task, what more can we learn ?