[PDF] - Conclusions TRECVID 2009 Conclusions TRECVID 2009 Multi Multi- PDF Document

SLIDE 1

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 1

Any Hope for Cross Any Hope for Cross-

Domain Concept Detection

Domain Concept Detection in Internet Video? in Internet Video?

Intelligent Systems Lab Amsterdam University of Amsterdam, The Netherlands

Cees Cees G.M. G.M. Snoek Snoek, , Koen Koen E.A. van de E.A. van de Sande Sande, , Dennis Dennis C.

C. Koelma

Koelma, & Arnold W.M. , & Arnold W.M. Smeulders Smeulders

y ,

Conclusions TRECVID 2009 Conclusions TRECVID 2009

Multi

Multi-

frame is true performance booster

frame is true performance booster

– 30% 30% improvement over single improvement over single-

frame baseline

frame baseline – Time for the community to move on to Time for the community to move on to video video analysis analysis

SLIDE 2

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 2

Community myths or facts? Community myths or facts?

Chua et al.,

Chua et al., ACM Multimedia 2007

ACM Multimedia 2007

– Video search is practically solved and progress Video search is practically solved and progress has only been incremental has only been incremental

Yang and Hauptmann,

Yang and Hauptmann, ACM CIVR 2008

ACM CIVR 2008

– Current solutions are weak and generalize poorly Current solutions are weak and generalize poorly

We have done an experiment We have done an experiment

Two video search engines from 2006 and 2009

Two video search engines from 2006 and 2009

– MediaMill MediaMill Challenge 2006 system Challenge 2006 system – MediaMill MediaMill TRECVID 2009 system TRECVID 2009 system

How well do they detect 36 LSCOM concepts?

How well do they detect 36 LSCOM concepts?

SLIDE 3

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 3

Four video data set mixtures Four video data set mixtures

TRECVID 2005 TRECVID 2007

Training

Training

Broadcast news Documentary video

Within domain

Testing

Testing

Documentary video Broadcast news

Cross domain

Performance doubled in just 3 years Performance doubled in just 3 years

Snoek & Smeulders, IEEE Computer 2010

36 concept detectors

– Even when using training Even when using training data of different origin data of different origin – Vocabulary still limited Vocabulary still limited – Vocabulary still limited Vocabulary still limited

SLIDE 4

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 4

State State-

of
f-
the

the-

Art

Art

Snoek et al, TRECVID 2008-2009 Van de Sande et al, PAMI 2010 Van Gemert et al, PAMI 2010 Software available for download at http://colordescriptors.com

State State-

of
f-
the

the-

Art

Art

Snoek et al, TRECVID 2008-2009 Van de Sande et al, PAMI 2010 Van Gemert et al, PAMI 2010 Software available for download at http://colordescriptors.com

GPU is 5 times faster than quad-core CPU Van de Sande et al, TMM 2011

Unresolved bottleneck: kernel

Unresolved bottleneck: kernel-

SVM

SVM

– # Support Vectors x Cost of kernel computation # Support Vectors x Cost of kernel computation

O (feature dimension)

SLIDE 5

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 5

Our TRECVID 2010 focus Our TRECVID 2010 focus

Baseline: TRECVID 2009 system

Baseline: TRECVID 2009 system

– 6 extra 6 extra i i-

frames per shot ~ 600K frames in test set

frames per shot ~ 600K frames in test set

Revisit multi

Revisit multi-

frame for Internet video

frame for Internet video

Training from multiple domains

Training from multiple domains

– Add 50K labels from TRECVID05 Add 50K labels from TRECVID05-

09 ~ 170K frames train set

09 ~ 170K frames train set – Requires efficient prediction Requires efficient prediction

is efficient is efficient

Maji et al., CVPR 2008 For the Intersection Kernel hi is

i

piecewise linear, and quite smooth, blue plot. We can approximate with fewer uniformly spaced segments, red

plot. Saves time & space!

Slide credit: Subhransu Maji

SLIDE 6

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 6

Experiment 1: Experiment 1: Avg Avg vs vs Max Max

Max multi-frame appears best choice for online video

(χ² χ²) )

Moving object appearance Moving object appearance

= Emphasis added

1 keyframe Shot boundary Shot boundary Probability

SLIDE 7

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 7

Moving object appearance Moving object appearance

= Emphasis added

1 Max keyframe Shot boundary Shot boundary Probability Avg

Experiment 2: Experiment 2: vs vs HIK HIK (max)

(max)

HIK 75 times faster, negligible loss in average precision

χ² χ²

Note: we submitted avg…

SLIDE 8

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 8

Top 21 results for “hand” Top 21 results for “hand” Top 21 results for “protest” Top 21 results for “protest”

SLIDE 9

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 9

Experiment 3: adding labels Experiment 3: adding labels

At best on par, often worse.

Top 21 results for “hand” Top 21 results for “hand”

SLIDE 10

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 10

Top 21 results for “protest” Top 21 results for “protest” TRECVID 2010 results TRECVID 2010 results

MediaMill not submitted MediaMill submitted 97 other methods

When considering submitted runs only

When considering submitted runs only

– Best performer for 6 concepts Best performer for 6 concepts – Best overall Best overall

SLIDE 11

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 11

Conclusions TRECVID 2010 Conclusions TRECVID 2010

Internet video concept detection is feasible

Internet video concept detection is feasible

– Use max for effective multi Use max for effective multi-

frame fusion

frame fusion – Use histogram intersection kernel for fast prediction Use histogram intersection kernel for fast prediction

We do not know how to exploit extra labeled

We do not know how to exploit extra labeled training samples from other domains training samples from other domains

– A good challenge! A good challenge!

Contact info Contact info

Cees Snoek

Cees Snoek http://staff.science.uva.nl/~ cgmsnoek http://staff.science.uva.nl/~ cgmsnoek

We are hiring! We are hiring!

We are hiring!

We are hiring!

– PhD’s and PhD’s and Postoc Postoc on video event retrieval

n video event retrieval

SLIDE 12

MediaMill TRECVID 2010 18‐11‐2010 http://www.MediaMill.nl 12

References References

http://www.mediamill.nl

The MediaMill TRECVID 2008-2010 Semantic Video Search Engine. C.G.M. Snoek et

al. Proceedings of the TRECVID Workshop.

Evaluating Color Descriptors for Object and Scene Recognition. K.E.A. van de Sande, Th. Gevers, C.G.M. Snoek. IEEE Trans. Pattern Analysis and Machine Intelligence, 2010. On the Surplus Value of Semantic Video Analysis Beyond the Key Frame. C.G.M. Snoek, M. Worring, J.M. Geusebroek, D.C. Koelma, and F.J.

Seinstra. Proc. IEEE Int’l Conference on Multimedia & Expo, 2005.

Empowering Visual Categorization with the GPU. K. E. A. van de Sande, T. Gevers, and C.G.M. Snoek. IEEE Trans. Multimedia, 2011. Classification using Intersection Kernel Support Vector Machines is Efficient Classification using Intersection Kernel Support Vector Machines is Efficient.

S. Maji, A.C. Berg and J. Malik. Proc. IEEE CVPR, 2008.

Concept-Based Video Retrieval. C.G.M. Snoek, M. Worring. Foundations and Trends in Information Retrieval, Vol. 4 (2), page 215-322, 2009. Visual-Concept Search Solved? C.G.M. Snoek, A.W.M. Smeulders. IEEE Computer,

vol. 43(6), page. 76-78, 2010.