Columbia University TRECVID-2006 High-Level Feature Extraction - - PowerPoint PPT Presentation

columbia university trecvid 2006 high level feature
SMART_READER_LITE
LIVE PREVIEW

Columbia University TRECVID-2006 High-Level Feature Extraction - - PowerPoint PPT Presentation

Columbia University TRECVID-2006 High-Level Feature Extraction Shih-Fu Chang, Winston Hsu, Wei Jiang, Lyndon Kennedy, Dong Xu, Akira Yanagawa, and Eric Zavesky Digital Video and Multimedia Lab, Columbia University


slide-1
SLIDE 1

Shih-Fu Chang, Winston Hsu, Wei Jiang, Lyndon Kennedy, Dong Xu, Akira Yanagawa, and Eric Zavesky Digital Video and Multimedia Lab, Columbia University http://www.ee.columbai.edu/dvmm

Columbia University TRECVID-2006 High-Level Feature Extraction

slide-2
SLIDE 2

2

6 runs

Visual-based

Overview – 5 methods & 6 submitted runs

5 methods

baseline context-based concept fusion baseline lexicon-spatial pyramid matching

visual_concept adaptive

multi-model_concept adaptive context LSPM text text feature event detection

1 2 3 4 5

slide-3
SLIDE 3

3

Overview – performance

MAP

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 A_CL1_1 A_CL2_2 A_CL3_3 A_CL4_4 A_CL5_5 A_CL6_6

  • context > baseline

context-based concept fusion (CBCF) improves baseline

  • LSPM > context

lexicon-spatial pyramid matching (LSPM) further improves detection

  • text > LSPM: text features improve visual

visual-based visual-text best visual best all

multi-model_ concept adaptive visual_ concept adaptive text LSPM context baseline

Every method contributes incrementally to the final detection

slide-4
SLIDE 4

4

Overview – performance

visual_concept adaptive > LSPM (also > context > baseline): best of visual selection works

visual-based visual-text best visual best all

text > multi-model_concept adaptive: best of all selection does not work well probably due to over fitting of text tool

MAP

0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 A_CL1_1 A_CL2_2 A_CL3_3 A_CL4_4 A_CL5_5 A_CL6_6

visual-based visual-text best visual best all

multi-model_ concept adaptive visual_ concept adaptive text spatial pyramid context baseline

slide-5
SLIDE 5

5

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-6
SLIDE 6

6

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-7
SLIDE 7

7

Color Texture Edge …

Fixed/Global

Support Vector Machines (SVM)

Individual Methods: (1) Baseline

Average fusion of two SVM baseline classification results Based on 3 visual features

color moments over 5x5 fixed grid partitions Gabor texture edge direction histogram from the whole image

1

coarse local features, layout, and global appearance

slide-8
SLIDE 8

8

2

ensemble classifier

Average fusion of two SVM baseline classification results Based on 3 visual features

color moments over 5x5 fixed grid partitions Gabor texture edge direction histogram from the whole image

Color Texture Edge …

Fixed/Global

Yanagawa et al., Tec. Rep., Columbia Univ., 2006 , http://www.ee.columbia.edu/dvmm/newPublication.htm

Individual Methods: (1) Baseline

Features and models available for download soon!

slide-9
SLIDE 9

9

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-10
SLIDE 10

10

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-11
SLIDE 11

11

Individual Methods: (2) CBCF

“Government-Leader” Detector Hard/specific concept “Face” Detector Generic concept “outdoor” Detector Generic concept

  • +

Outdoor Face Government-Leader

Context-based Model

different person different view large variance in appearance government-leader

Context Information

Background on Context Fusion

slide-12
SLIDE 12

12

  • utdoor detector

government-leader detector face detector

context-based model

(government-leader|image) P

  • (face|image)

P

  • (outdoor|image)

P

  • (government-leader|image)

P

(face|image) P

(outdoor|image) P

Formulation

Individual Methods: (2) CBCF

(Naphade et al 2002)

slide-13
SLIDE 13

13

  • utdoor detector

government-leader detector face detector

(government-leader|image) P

  • (face|image)

P

  • (outdoor|image)

P

  • (government-leader|image)

P

(face|image) P

(outdoor|image) P

Our approach: Discriminative + Generative

  • utdoor

airplane

  • ffice

Conditional Random Field (Jiang, Chang, et al I CI P 2006)

  • bservation

updated posteriors

1

x

2

x

3

x

1

( 1| ) p y = X

2

( 1| ) p y = X

3

( 1| ) p y = X I

1

C

2

C

3

C

Individual Methods: (2) CBCF

slide-14
SLIDE 14

14

  • utdoor detector

government-leader detector face detector

(government-leader|image) P

  • (face|image)

P

  • (outdoor|image)

P

  • (government-leader|image)

P

(face|image) P

(outdoor|image) P

Conditional Random Field

  • bservation

updated posteriors

(1 )/ 2 (1 )/ 2

( 1| ) ( 1| )

i i i

y y i i I C

J p y p y

+ −

= − = = −

∏∏

X X

1

x

2

x

3

x

1

( 1| ) p y = X

2

( 1| ) p y = X

3

( 1| ) p y = X I

min

Our approach: Discriminative + Generative

1

C

2

C

3

C

iteratively minimized by boosting

Individual Methods: (2) CBCF

slide-15
SLIDE 15

15

(1 )/ 2 (1 )/ 2

( 1| ) ( 1| )

i i i

y y i i I C

J p y p y

+ −

= − = = −

∏∏

X X

min

iteratively minimized by boosting

During each iteration t: two SVM classifiers are trained for each concept:

  • 1. Using input independent detection results
  • 2. Using updated posteriors from iteration t-1

Classifier 2 keeps updating through iteration And captures inter-conceptual influences Without classifier 2, Traditional AdaBoost

Individual Methods: (2) CBCF

slide-16
SLIDE 16

16

Database & lexicon for context

  • Predefined lexicon to provide context
  • - 374 concepts from LSCOM ontology (observation)

airplane, building, car, boat, person, outdoor, sports, etc

  • Independent detector
  • - our baseline
  • Test concepts
  • - the 39 concepts defined by NIST (update posteriors)

Individual Methods: (2) CBCF

slide-17
SLIDE 17

17

0.2 0.4 0.6 0.8 1 1.2 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 independent detector Boosted CRF

experimental results over TRECVID 2005 development set

24 improve 15 degrade

AP

context-based fusion independent detector

Individual Methods: (2) CBCF

slide-18
SLIDE 18

18

Selective Application of Context

  • Not every concept classification benefits

from context-based fusion

  • Is there a way to predict when it works?

Consistent with previous context-based fusion:

IBM: no more than 8 out of 17 concepts gained performance

[Amir et al., TRECVID Workshop, 2003]

Mediamill: 80 out of 101 concepts

[Snoek et al., TRECVID Workshop, 2005]

slide-19
SLIDE 19

19

Predict When Context Helps

  • Strong classifiers may suffer from fusion with weak context
  • Complex inter-conceptual relationships vs. limited training samples

Why CBCF may not help every concept ?

Strong context

, ,

( ; ) ( ) ( ; )

j j

j i j C j i j i C j i

I C C E C I C C β

≠ ≠

<

∑ ∑

  • r

Avoid using CBCF for if is strong and with weak context

i

C

i

C

Use CBCF for concept if is weak or with strong context

i

C

i

C

  • - mutual information between and

( ; )

i j

I C C

i

C

j

C

( )

i

E C

  • - error rate of independent detector for

i

C

( )

i

E C λ >

weak concept

slide-20
SLIDE 20

20

Predict When Context Helps

Change parameters to predict different number of concepts # predicted # concept improved MAP gain precision of prediction 9 9 7.2% 100% 39 24 3.0% 62% 20 15 9.5% 75% 16 14 14% 88%

slide-21
SLIDE 21

21

Example

. . . Fighter_Combat I ndividual House Military

slide-22
SLIDE 22

22

I ndependent Detector

Example

slide-23
SLIDE 23

23

Context-based concept fusion

Example

slide-24
SLIDE 24

24

Context-based concept fusion

Example

House

slide-25
SLIDE 25

25

Context-based concept fusion

Example

Positive frames are moved forward with the help of Fighter_Combat

slide-26
SLIDE 26

26

Context-Based Fusion + Baseline

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

R6 R5

All get improved !

baseline context

MAP Gain: 14%

TRECVI D 2005 development set

slide-27
SLIDE 27

27

Context-Based Fusion + Baseline

4 concepts

TRECVI D 2006 evaluation

0.05 0.1 0.15 0.2 0.25 0.3 1 2 3 4 AP

baseline context

Similar to results over TRECVI D 2005 set !

slide-28
SLIDE 28

28

Discussion

Concepts with performance improved: 3.23 Concepts with performance degraded: 4.17 Adding context – strong relationship and robust Quality of context:

, ,

( ; ) ( ) ( ; )

j j

j i j C j i j i C j i

I C C E C I C C

≠ ≠

∑ ∑

The smaller the better

slide-29
SLIDE 29

29

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-30
SLIDE 30

30

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-31
SLIDE 31

31

Individual Methods: (3) LSPM

Local features (SIFT) Spatial layout

sky water tree

Spatial Pyramid Matching (SPM) [Lazebnik et al. CVPR, 2006]

multi-resolution histogram matching in spatial domain, bags-of-features

Lexicon-Spatial Pyramid Matching (LSPM)

SPM matching guided by multi-resolution lexicons

Appropriate size for visual lexicon ?

slide-32
SLIDE 32

32 t 1 t 2 t n t 3 t 4 t 5 t 1_1 t n_1 t 1_2 . . . t n_2 t 2_1 t 2_2 t 3_1 t 3_2 t 4_1 t 4_2 t 5_1 t 5_2

SI FT features Lexicon level 0 Lexicon level 1

Individual Methods: (3) LSPM

slide-33
SLIDE 33

33

Image 1

. . .

Image 2

. . .

Local features & Spatial layout of local features

| |

SPM kernel

+ + . . . t 1 t 2 t n . . . Lexicon level 0 spatial level 0

. . .

spatial level 1

. . .

spatial level 2

Individual Methods: (3) LSPM

slide-34
SLIDE 34

34

t 1 t 2 t n . . . Lexicon level 0 Lexicon level 1 t 1_1 t n_1 t 1_2 . . . t n_2

. . .

SPM kernel 0 SPM kernel 1

. . . + +

| |

LSPM kernel

SVM classifier

Individual Methods: (3) LSPM

slide-35
SLIDE 35

35

0.05 0.1 0.15 0.2 0.25 0.3 1 2 3 4 5 6 AP with LSPM without LSPM

We apply LSPM to 13 concepts:

flag-us, building, maps, waterscape-waterfront, car, charts, urban, road, boat-ship, vegetation, court, government-leader

Complements baseline by considering local features

almost all get improved !

6 are evaluated by NIST

Individual Methods: (3) LSPM

slide-36
SLIDE 36

36

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-37
SLIDE 37

37

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-38
SLIDE 38

38

Individual Methods: (4) Text

asynchrony between the words being spoken and the visual concepts appearing in the shot

Problems: Solution:

incorporate associated text from the entire story

story bag-of-words ( term-frequency-inverse document frequency)

training data: bag-of-words features of stories ground-truth label: positive – one shot is positive SVM

dimension reduction by frequency

  • - top k most

frequent words

automatically detected story boundaries

[Hsu et al., ADVENT Technical Report , Columbia Univ., 2005 ]

slide-39
SLIDE 39

39

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AP

visual only text + visual

0.2 text + 0.8 visual

MAP Gain 4.5%

Individual Methods: (4) Text

slide-40
SLIDE 40

40

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-41
SLIDE 41

41

  • Baseline
  • Context-based concept fusion (CBCF)
  • Lexicon-spatial pyramid matching (LSPM)
  • Text features
  • Event detection

Outline – New Algorithms

slide-42
SLIDE 42

42

Individual Methods: (5) Event

Event detection: Key frame v.s. Multiple frames

slide-43
SLIDE 43

43

Individual Methods: (5) Event

Event detection: Key frame v.s. Multiple frames

P

. . .

p1 pm P Supply

. . . Q

q1 qn q2 Q demand dij

Earth Mover’s Distance: minimum weighted distance by linear programming

1 1/ 2 1/ 2 fij: correspondence flow SVM

handle temporal shift:

a frame at the beginning of P can map to a frame at the end of Q

Handle scale variations: a frame from P can map to multiple frames in Q

slide-44
SLIDE 44

44

Individual Methods: (5) Event

experimental results

0.2 0.4 0.6 0.8 1

AP

Key Frame EMD

Performance over TRECVID 2005 development set 11 events: airplane_flying, people_marching, car_crash, exiting_car, demonstration_or_protest, election_campaign_greeting, parade, riot, running, shooting, walking

slide-45
SLIDE 45

45

Conclusion

  • TRECVID 2006 offers a mature opportunity for evaluating concept interaction

— We have built 374 concept detectors — Models and feature will be released soon

  • Context-Based Fusion

— Propose a systematic framework for predicting the effect of context fusion — (TRECVID 2005) 14 out of 16 predicted concepts show performance gain — (TRECVID 2006) 3 out of 4 predicted concepts show performance gain — Promising methodology for scaling up to large-scale systems (374 models)

  • Results from Parts-based model (LSPM) are mixed

— But show consistent improvement when fused with SVM baseline — 3 out of 6 concepts improve by more than 10%

  • Temporal event modeling

— We propose a novel matching and detection method based on EMD+ SVM — Show consistent gains in 2005 data set — Results in 2006 are incomplete and lower than expected

slide-46
SLIDE 46

46

  • More information at

– http://www.ee.columbia.edu

  • Features and models for baseline

detectors for 374 LSCOM concepts coming soon