Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, - - PowerPoint PPT Presentation

parts based concept detectors
SMART_READER_LITE
LIVE PREVIEW

Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, - - PowerPoint PPT Presentation

TRECVI D 2005 Workshop Columbia University High-Level Feature Detection: Parts-based Concept Detectors Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab Columbia University (In collaboration


slide-1
SLIDE 1

1 S.F. Chang, Columbia U.

Dong-Qing Zhang, Shih-Fu Chang, Winston Hsu, Lexin Xie, Eric Zavesky Digital Video and Multimedia Lab Columbia University

(In collaboration with IBM Research in ARDA VACE II Project)

Columbia University High-Level Feature Detection:

Parts-based Concept Detectors

TRECVI D 2005 Workshop

slide-2
SLIDE 2

2 S.F. Chang, Columbia U.

data source and design principle

Multi-lingual multi-channel video data

  • 277 videos, 3 languages (ARB, CHN, and ENG)
  • 7 channels, 10+ different programs
  • Poor or missing ASR/MT transcripts

A very broad concept space over diverse content

  • bject, site, people, program, etc

TV05 (10), LSCOM-Lite (39), LSCOM (449)

Concept detection in such a huge space is challenging

Need a principled approach Take advantage of the extremely valuable annotation set Data-driven learning based approach offers potential for

scalability

slide-3
SLIDE 3

3 S.F. Chang, Columbia U.

Insights from Samples: Object - flag

  • Unique object appearance and structure
  • Some even fool the annotator
  • Variations in scale, view, appearance, number
  • Noisy labels
  • Sometimes contextual, spatial cues are helpful for detection
  • Speaker, stage, sky, crowd
slide-4
SLIDE 4

4 S.F. Chang, Columbia U.

Site/location

Again visual appearance and spatial structures

very useful

slide-5
SLIDE 5

5 S.F. Chang, Columbia U.

Activity/Event

Visual appearances capture the after effects of some

events – smoke, fire

Sufficient cues for detecting occurrences of events Other events (e.g., people running) need object

tracking and recognition

slide-6
SLIDE 6

6 S.F. Chang, Columbia U.

Motivation for Spatio-Appearance Models

Many visual concepts characterized by

Unique spatial structures and visual appearances

  • f the objects and sites

joint occurrences of accompanying entities with

spatial constraints

Motivate the deeper analysis of spatio-

appearance models

slide-7
SLIDE 7

Color Moment Color Moment

Part Part relation

Part-based model: Model appearance at salient points Model part relations Robust against occlusion, background,

location change traditional Adaptive Sampling: Object Parts

Support Vector Machine (SVM)

Block-based features: visual appearances of fixed blocks +

block locations

suitable for concepts with fixed spatial

patterns

Spatio-Features: How to sample local features?

slide-8
SLIDE 8

Parts-based object detection paradigm also related to

Human Vision System (HVS)

[Rybak et al. 98’]

Group retinal images into object

Attentive stage

  • bject

Eye movement and fixation to get retinal images in local regions

Pre-attentive stage Image

slide-9
SLIDE 9

9 S.F. Chang, Columbia U.

Our TRECVID 2005 Objectives

Explore the potential strengths of

parts-based models in

detecting spatio-dominant concepts fusing with traditional fixed features

models

detecting other interesting patterns such

as Near-Duplicates in broadcast news

slide-10
SLIDE 10

10 S.F. Chang, Columbia U.

How do we extract and represent parts?

Maximum Entropy Regions Interest points Segmented Regions

Part detection Gabor filter, PCA projection, Color histogram, Moments … Feature Extraction within local parts Part-based representation

Bag Structural Graph Attributed Relational Graph

slide-11
SLIDE 11

Representation and Learning

I ndividual images Salient points, high entropy regions Attributed Relational Graph (ARG)

Graph Representation

  • f Visual Content

size; color; texture

Collection of training images Random Attributed Relational Graph (R-ARG)

Statistical Graph Representation

  • f Model

Statistics of attributes and relations

machine learning

spatial relation

slide-12
SLIDE 12

Learning Object Model

Matching Probability

Patch image cluster

Challenge : Finding the correspondence of parts and computing matching

probability are NP-complete

Solution :

Apply and develop advanced machine learning techniques – Loopy Belief

Propagation (LBP), and Gibbs Sampling plus Belief Optimization (GS+ BO)

(demo)

Re-estimate

slide-13
SLIDE 13

13 S.F. Chang, Columbia U.

Role of RARG Model: Explain object generation process

Generative Process : From object model to image

2 1 6 5 4 3

Object Model

Random ARG ARG

Object I nstance

Sampling node occurrence And node/edge features

ARG

Background parts Part-based Representation

  • f I mage

Sampling background pdf and add background parts

slide-14
SLIDE 14

14 S.F. Chang, Columbia U.

Object Detection by Random AG

Likelihood ratio test : Object likelihood :

X modeled by Association Graph

Binary detection problem : contain or not contain an object ?

H= 1 H= 0

, O: input ARG

Random ARG for

  • bject model

ARG for image Association Graph 1 2 3 1 2

i

n

j

n

u

n

v

n

iu

x

Correspondence 3 4 4 5 6

( | 1) ( | 1) ( | , 1) P O H P X H P O X H = = = =

  • Probabilities computed by MRF
  • Likelihood ratio can be computed by

variational methods (LPB, MC)

slide-15
SLIDE 15

15 S.F. Chang, Columbia U.

Extension to Multi-view Object Detection

Challenge of multi-view object/scene detection

Objects under different views have different structures Part appearances are more diverse

Shared parts are visible from different views

Structure variation could be handled by Random ARG model (each view covered by a sub-graph)

slide-16
SLIDE 16

16 S.F. Chang, Columbia U.

Adding Discriminative Model for Multi-view Concept Detection

Use SVM plus non-linear kernels to model diverse part

appearance in multiple views

principle similar to boosting

Previous :

Part appearance modeling by Gaussian distribution

Now :

Part appearance modeling by Support Vector Machine

slide-17
SLIDE 17

17 S.F. Chang, Columbia U.

Evaluation in TRECVID 2005

slide-18
SLIDE 18

Parts-based detector performance in TRECVID 2005

  • Parts-based detector

consistently improves by more than 10% for all concepts

  • It performs best for

spatio-dominant concepts such as “US flag”.

  • It complements nicely

with the discriminant classifiers using fixed features.

fixed feature Baseline SVM Adding Parts-based

  • Avg. performance over all concepts

fixed feature Baseline SVM Adding Parts-based

Spatio-dominant concepts: “US Flag”

slide-19
SLIDE 19

19 S.F. Chang, Columbia U.

Relative contributions

Baseline SVM Add parts-based

Add text or change fusion models does not help

slide-20
SLIDE 20

Other Applications of Parts-Based Model: Detecting Image Near Duplicates (IND)

Scene Change Camera Change Digitization Digitization

Parts-based Stochastic Attribute Relational Graph Learning

Stochastic graph models the physics of scene transformation Measure I ND likelihood ratio

Learning Pool Learning

  • Near duplicates occur frequently in

multi-channel broadcast

  • But difficult to detect due to diverse

variations

  • Problem Complexity

Similarity matching < IND detection <

  • bject recognition

Duplicate detection is the single most effective tool in our Interactive Search

TRECVI D 05 I nteractive Search

Many Near-Duplicates in TRECVD 05

slide-21
SLIDE 21

21 S.F. Chang, Columbia U.

Near Duplicate Benchmark Set

(available for download at Columbia Web Site)

slide-22
SLIDE 22

22 S.F. Chang, Columbia U.

Examples of Near Duplicate Search in TRECVID 05

slide-23
SLIDE 23

Subshots

Concept Search

Query Documents

Query Text “Find shots of a road with one or more cars” Part-of-Speech Tags - keywords “road car” Map to concepts WordNet Resnik semantic similarity Concept Metadata Names and Definitions Concept Space 39 dimensions (1.0) road (0.1) fire (0.2) sports (1.0) car …. (0.6) boat (0.0) person Confidence for each concept Concept Models Simple SVM, Grid Color Moments, Gabor Texture Model Reliability Expected AP for each concept. Concept Space 39 dimensions (0.9) road (0.1) fire (0.3) sports (0.9) car …. (0.2) boat (0.1) person (0.9) road (0.1) fire (0.3) sports (0.9) car …. (0.2) boat (0.1) person (0.9) road (0.1) fire (0.3) sports (0.9) car …. (0.2) boat (0.1) person (0.9) road (0.1) fire (0.3) sports (0.9) car …. (0.2) boat (0.1) person (0.9) road (0.1) fire (0.3) sports (0.9) car …. (0.2) boat (0.1) person Euclidean Distance

  • Map text queries

to concept detection

  • Use human-

defined keywords from concept definitions

  • Measure

semantic distance between query and concept

  • Use detection

and reliability for subshot documents

Application: Concept Search

slide-24
SLIDE 24

Concept Search

.195

Fused .115 Concept .002 CBIR .169 Story Text

AP Method

Automatic - help queries with related concepts

“Find shots of boats.”

.095

Fused .090 Concept .009 CBIR .053 Story Text

AP Method

“Find shots of a road with one or more cars.”

Manual / Interactive

Manual keyword selection allows more relationships to be found.

Query Text “Find shots of an office setting, i.e., one or more desks/tables and one or more computers and one or more people” Concepts Office Query Text “Find shots of a graphic map of Iraq, location of Bagdhad marked - not a weather map” Concepts Map Query Text “Find shots of one or more people entering

  • r leaving a building”

Concepts Person, Building, Urban Query Text Find shots of people with banners or signs Concepts March or protest

slide-25
SLIDE 25

Multi-modal Search Tools

  • combined text-concept search
  • story-based browsing
  • near-duplicate browsing

Content Exploitation

  • multi-modal feature extraction
  • story segmentation
  • semantic concept detection

Columbia Video Search Engine System Overview

http://www.ee.columbia.edu/cuvidsearch

automatic story segmentation video speech text near-duplicate detection concept detection feature extraction (text, video, prosody) concept search text search Image matching story browsing Near-duplicate search Interactive search automatic/manual search cue-X re-ranking mining query topic classes user search pattern mining

User Level Search Objects

  • Query topic class mining
  • Cue-X reranking
  • Interactive activity log

Demo in the poster session

slide-26
SLIDE 26

26 S.F. Chang, Columbia U.

Search User Interface

slide-27
SLIDE 27

27 S.F. Chang, Columbia U.

Conclusions

Parts-based models are intuitive and general

Effective for concepts with strong spatio-

appearance cues

Complementary with fixed feature classifiers

(e.g., SVM)

Semi-supervised: the same image-level

annotations sufficient, no need for part-level labels

Parts models also useful for detecting near

duplicates in multi-source news

Valuable for interactive search