[PPT] - C Context-based Visual Concept Context C t t t b t based PowerPoint Presentation

SLIDE 1

C t t C t t b d Vi l C t b d Vi l C t Context Context-based Visual Concept based Visual Concept Detection Using Domain Adaptive Detection Using Domain Adaptive Detection Using Domain Adaptive Detection Using Domain Adaptive Semantic Diffusion Semantic Diffusion

Yu-Gang Jiang‡, Jun Wang‡, Shih-Fu Chang‡, Chong-Wah Ngo†

† VIREO Research Group (VIREO), City University of Hong Kong ‡ Digital Video and Multimedia Lab (DVMM), Columbia University 1

NIST TRECVID Workshop, Nov. 2009

SLIDE 2

Overview: framework

Local Feature Global Feature SVM Classifiers 6 5 VIREO-374: 374 LSCOM

Domain Adaptive S ti Diff i

1‐4 374 LSCOM concept detectors

Semantic Diffusion

SLIDE 3

Overview: performance p

0.25 0.20

Precision

DASD Local + global features

0 10 0.15

d Average

g Local feature alone

0.05 0.10

ean Inferred

0.00

Me

222 system runs

Local feature is still the most powerful component (MAP=0.150) Global features help a little bit (MAP=0.156) DASD further contributes incrementally to the final detection

3

DASD further contributes incrementally to the final detection

SLIDE 4

Overview: framework

Local Feature Global Feature SVM Classifiers 6 5 VIREO-374: 374 LSCOM

Domain Adaptive S ti Diff i

1‐4 374 LSCOM concept detectors

Semantic Diffusion

SLIDE 5

Local feature representation p

space SIFT feature S

5

Chang et al TRECVID 2008; Jiang, Yang, Ngo & Hauptmann, IEEE TMM, to appear

SLIDE 6

Context-based concept detection p

Local Feature Global Feature SVM Classifiers 6 5 VIREO-374: 374 LSCOM

DASD: Domain Adaptive Semantic

1‐4 374 LSCOM concept detectors

Adaptive Semantic Diffusion

1 4

SLIDE 7

DASD - motivation

Most existing methods aim at the

Most existing methods aim at the assignment of concept labels individually

but concepts do not occur in isolation! – but concepts do not occur in isolation!

military personnel smoke building explosion_fire vehicle road

utdoor

7

SLIDE 8

DASD - motivation

Most existing methods aim at the

Most existing methods aim at the assignment of concept labels individually

but concepts do not occur in isolation! – but concepts do not occur in isolation!

Domain change between training and

Broadcast

testing data was not considered

News Videos

8

Documentary Videos

SLIDE 9

DASD - overview

road vehicle

0.05 0.01

water

0.11

sky

0.01 0.19 0.80 0.12 0.91 0.58 0.10 0.36 0.53 0.46 0.13 0.18 0.05 0.13 0.02 0.17 0.23 0.05 0.02 0.23

9

Jiang, Wang, Chang & Ngo, ICCV 2009

SLIDE 10

DASD - overview

Domain adaptive

p semantic diffusion (DASD)

road

0.1 0.2 0.8 0.5 0.1 …

( )

– Semantic graph

Nodes are concepts

0.4 0.0 0.1 0.9

Edges represent

concept correlation

G h diff i

vehicle

0.1 0.6 0.1 0.0 0.4 0.5 0.2 0.1 … 0.3

– Graph diffusion

Smooth concept

detection scores w.r.t

Water sky

0.1 0.0 … 0.8 0.2 0.8 … 0.7

detection scores w.r.t the concept correlation

10

SLIDE 11

DASD - formulation

Energy function

gy

f l Detection score of concept ci on test samples Concept affinity Concept affinity

11

SLIDE 12

DASD - formulation (cont.) ( )

Gradually smooth the function makes the
Gradually smooth the function makes the

detection scores in accordance with the t l ti hi concept relationships

Detection score smoothing process

12

SLIDE 13

DASD - formulation (cont.) ( )

Graph adaptation

p p

Graph adaptation process

13

SLIDE 14

Graph adaptation - example

WEAPON WEAPON WEAPON WEAPON WEAPON WEAPON

0.24

CLOUDS

0.18

CLOUDS

0.12

CLOUDS

0.05

CLOUDS

0.00

CLOUDS

0.00

CLOUDS

0 16

DESERT CLOUDS

0.10 0 16

DESERT CLOUDS

0.13 0 15

DESERT CLOUDS

0.16 0.15

DESERT CLOUDS

0.19 0 15

DESERT CLOUDS

0.24 0.15

DESERT CLOUDS

0.27 0.16

VEHICLE

0.64 0.20 0.29

SKY CAR

0.16

VEHICLE

0.64 0.19 0.32

SKY CAR

0.15

VEHICLE

0.64 0.17 0.34

SKY CAR

0.15

VEHICLE

0.64 0.17 0.38

SKY CAR

0.15

VEHICLE

0.64 0.16 0.42

SKY CAR

0.15

VEHICLE

0.64 0.16 0.43

SKY CAR

0.09

VEHICLE SKY PARKING_LOT

0.09

VEHICLE SKY PARKING_LOT

0.09

VEHICLE SKY PARKING_LOT

0.08

VEHICLE SKY PARKING_LOT

0.08

VEHICLE SKY PARKING_LOT

0.08

VEHICLE SKY PARKING_LOT

Iteration: 8 Iteration: 12 Iteration: 0 Iteration: 4 Iteration: 16 Iteration: 20

Broadcast news video domain Documentary video domain

14

SLIDE 15

Experiments on TV ’05-’07 p

Baseline detectors

– VIREO-374

Graph construction:
Graph construction:

– Ground-truth labels on TRECVID 2005

TRECVID 05/06 (Broadcast News Videos) TRECVID 07 (Documentary Videos)

SPORTS SPORTS WEATHER WEATHER WALKING WALKING PEOPLE PEOPLE MAP MAP SPORTS WEATHER OFFICE CLASSROOM BUS OFFICE OFFICE BUILDING BUILDING DESERT DESERT MOUNTAIN MOUNTAIN PEOPLE PEOPLE- MARCHING MARCHING EXPLOSION EXPLOSION-

FIRE

FIRE TRUCK TRUCK

CORP. LEADER
CORP. LEADER

DESERT DESERT MOUNTAIN MOUNTAIN WATER WATER POLICE POLICE MILITARY MILITARY ANIMAL ANIMAL TWO PEOPLE TWO PEOPLE NIGHT TIME NIGHT TIME TELEPHONE TELEPHONE STREET STREET

15

SLIDE 16

Results on TV ’05-’07

Performance gain on TRECVID 05-07

g Datasets

TRECVID‐ 2005 2006 2007 # of evaluated concepts 39 20 20 Baseline (MAP) 0.166 0.154 0.099 SD 11.8% 15.6% 12.1% DASD 11 9% 17 5% 16 2% DASD 11.9% 17.5% 16.2%

SD: semantic diffusion (without graph adaptation) SD: semantic diffusion (without graph adaptation) Consistent improvement over all 3 data sets DASD: domain adaptive semantic diffusion

16

Graph adaptation further improves the performance

SLIDE 17

Results on TV ’05-’07 (cont.) ( )

TRECVID 2006 Test Data

0.5

Baseline Semantic Graph Diffusion

0.3 0.4

Precision

0.1 0.2

Average P

C i ith th t t f th t TRECVID Jiang et al Aytar et al Weng et al DASD 2005 2.2% 4.0% N/A 11.9% Comparison with the state‐of‐the‐arts

17

2005 2.2% 4.0% N/A 11.9% 2006 N/A N/A 16.7% 17.5%

SLIDE 18

Results on TRECVID ’09

0.4 0.3 0.35

A_vireo.localglobal_5 A_vireo.dasd20fcs_2

0.2 0.25 0.1 0.15

30% 10%

5%

0.05

18

SLIDE 19

Results on TRECVID ’09 (cont.) ( )

Quality of contextual detectors (VIREO-374)

0.25

n

y ( )

5% DASD performance gain

0 15 0.20

ge Precisio

TV09 detectors

16%

0.10 0.15

rred Avera

Context TV06 d t t TV07 detectors

18%

0.05

Mean Infer

VIREO-374 detectors

0.00

222 system runs

19

SLIDE 20

DASD - computational time p

Complexity is O(mn)

Complexity is O(mn)

– m: # concepts; n: # video shots

O l 2 illi d h t/k f !

Only 2 milliseconds per shot/keyframe!

TRECVID 05 TRECVID 06 TRECVID 07 SD 59s 84s 12s DASD 89s 165s 28s

20

SLIDE 21

Summary

A well-designed approach using local features

achieves good results for concept detection achieves good results for concept detection.

Context information is helpful !

– Domain adaptive semantic diffusion

effective for enhancing concept detection accuracy

ll i t th ff t f d t d i h

can alleviate the effect of data domain changes
highly efficient !

Future directions include: – Future directions include:

detector reliability: diffusion over directed graph
web data annotation: utilize contextual information to improve

p the quality of tags

– Source code available for download from DVMM lab

21

research page

SLIDE 22

22