[PDF] - 1 Training and Addition are Separate System Overview System PDF Document

SLIDE 1

1 Scalable Recognition with a Scalable Recognition with a Vocabulary Tree Vocabulary Tree

by: David Nistér Henrik Stewénius presented by: William Malpica CS 395T

Some slides from Nister and Stewenius’s CVPR 2006 presentation

Outline Outline

Abstract

Abstract

Strengths

Strengths

System Overview

System Overview

Animated explanation of the vocabulary tree

Animated explanation of the vocabulary tree

Explanation of the scoring scheme

Explanation of the scoring scheme

Testing Results

Testing Results

Conclusion

Conclusion

Scalable Recognition with a Scalable Recognition with a Vocabulary Tree Vocabulary Tree

The paper describes a system which can

The paper describes a system which can recognize objects from a very large recognize objects from a very large database with great speed and recognition database with great speed and recognition quality. quality.

The system uses local region descriptors

The system uses local region descriptors which are hierarchically quantized in a which are hierarchically quantized in a vocabulary tree. vocabulary tree.

Strengths! Strengths!

The vocabulary tree directly defines the

The vocabulary tree directly defines the quantization. quantization.

Each high

Each high-

dimension feature vector is

dimension feature vector is quantized into an integer which corresponds quantized into an integer which corresponds to a path in the vocabulary tree. to a path in the vocabulary tree.

Results in speed

Results in speed

Feature extraction on a 640x480 video frame in

Feature extraction on a 640x480 video frame in 0.2 sec. and database query in 25ms on a 50000 0.2 sec. and database query in 25ms on a 50000 image database. image database.

Results in compactness

Results in compactness

Strengths! Strengths!

Potential for on

Potential for on-

the

the-

fly insertion

fly insertion

An offline unsupervised training stage is

An offline unsupervised training stage is necessary to create the vocabulary, but new necessary to create the vocabulary, but new images can be added to the database on images can be added to the database on-

the

the-

fly.

fly.

Images can be added an the same rate as

Images can be added an the same rate as feature extraction. feature extraction.

Excellent benefit for large scalable image

Excellent benefit for large scalable image databases. databases.

Adding, Querying and Removing Adding, Querying and Removing Images at full speed Images at full speed

Add Remove Query

SLIDE 2

2

Training and Addition are Separate Training and Addition are Separate

Common Approach Our approach

System Overview System Overview

Maximally Stable

Maximally Stable Extremal Extremal Regions Regions ( (MSERs MSERs) feature extractor. ) feature extractor.

SIFT feature descriptor

SIFT feature descriptor

Feature space is quantized through k

Feature space is quantized through k-

means clustering and build into a

means clustering and build into a vocabulary tree. vocabulary tree.

To retrieve images, a hierarchical scoring

To retrieve images, a hierarchical scoring scheme is used based on Term Frequency scheme is used based on Term Frequency Inverse Document Frequency (TF Inverse Document Frequency (TF-

IDF).

IDF).

SLIDE 3

3

SLIDE 4

4

SLIDE 5

5

SLIDE 6

6 Definition of Scoring Definition of Scoring

Weights

Weights are assigned to each node (with are assigned to each node (with certain exceptions) certain exceptions)

Query and database vectors are defined

Query and database vectors are defined according to their assigned weights according to their assigned weights

Each database image is given a

Each database image is given a relevance score based on the normalized relevance score based on the normalized differences between the query and differences between the query and database vectors database vectors

i i

N N w ln =

i i i i i i

w m d w n q = = d d q q d q s − = ) , (

(1) (2) (3) (4)

Implementation of Scoring Implementation of Scoring

Every node is associated with an inverted file,

Every node is associated with an inverted file, although only leaf nodes are explicitly although only leaf nodes are explicitly

represented. Inner nodes are a concatenation of
represented. Inner nodes are a concatenation of

the leaf nodes. the leaf nodes.

Inverted files store the id

Inverted files store the id-

numbers of the

numbers of the images in which a particular node occurs, and images in which a particular node occurs, and the term frequency for that image. the term frequency for that image.

The vectors representing the database images

The vectors representing the database images as well as the query images are normalized to as well as the query images are normalized to unit magnitude. unit magnitude.

Normalization Normalization

To compute the normalized difference in

To compute the normalized difference in Lp Lp-

norm:

norm:

∑ ∑

≠ ≠

− − − + = − − = −

,

) ( 2

i i

d q i p i p i p i i p p i p i i p p

d q d q d q d q d q

For the case of the L2

For the case of the L2-

norm:

norm:

∑

≠ ≠

− = −

, 2 2

2 2

i i

d q i i id

q d q

(5) (6) (7)

Testing Testing

Ground truth

Ground truth database consisted of database consisted of 6376 images in 6376 images in groups of four. groups of four.

The database was

The database was queried with every queried with every image and was image and was evaluated on how evaluated on how frequently the other frequently the other three images are three images are found perfectly. found perfectly.

Results for only 1400 images Results for only 1400 images

SLIDE 7

7 Results for only 1400 images Results for only 1400 images Results for only 1400 images Results for only 1400 images Results with full 6376 image Results with full 6376 image database database Other Tests Other Tests – – 40000 CD covers 40000 CD covers

Method was

Method was tested on a tested on a database of database of 40000 CD 40000 CD covers covers running real running real-

time.

time.

Other Tests Other Tests – – 1 million images 1 million images

Method was also tested on a database of 1

Method was also tested on a database of 1 million images. The ground truth images were million images. The ground truth images were embedded into a database containing all the embedded into a database containing all the frames from several movies: The Bourne frames from several movies: The Bourne Identity, The Matrix, Identity, The Matrix, Braveheart Braveheart, Collateral, , Collateral, Resident Evil, Almost Famous and Monsters Inc. Resident Evil, Almost Famous and Monsters Inc.

Queries on a 8GB machine would take about 1

Queries on a 8GB machine would take about 1

second. Database creation took 2.5 days.
second. Database creation took 2.5 days.

Other Tests Other Tests – – 1 million images 1 million images

SLIDE 8

8

Other Tests Other Tests – – Non movie images Non movie images queried on 300K frames queried on 300K frames

Conclusion Conclusion

This methodology provides the

This methodology provides the abililty abililty to to make fast searches on extremely large make fast searches on extremely large databases. databases.

1 Scalable Recognition with a Scalable Recognition with a Vocabulary Tree Vocabulary Tree

Outline Outline

Abstract

Strengths

System Overview

Animated explanation of the vocabulary tree

Explanation of the scoring scheme

Testing Results

Conclusion

Scalable Recognition with a Scalable Recognition with a Vocabulary Tree Vocabulary Tree

The paper describes a system which can recognize objects from a very large recognize objects from a very large database with great speed and recognition database with great speed and recognition quality. quality.

The system uses local region descriptors which are hierarchically quantized in a which are hierarchically quantized in a vocabulary tree. vocabulary tree.

Strengths! Strengths!

The vocabulary tree directly defines the quantization. quantization.

Each high-

dimension feature vector is quantized into an integer which corresponds quantized into an integer which corresponds to a path in the vocabulary tree. to a path in the vocabulary tree.

Results in speed

Results in compactness

Strengths! Strengths!

Potential for on-

the-

fly insertion

An offline unsupervised training stage is necessary to create the vocabulary, but new necessary to create the vocabulary, but new images can be added to the database on images can be added to the database on-

the-

fly.

Images can be added an the same rate as feature extraction. feature extraction.

Excellent benefit for large scalable image databases. databases.

Adding, Querying and Removing Adding, Querying and Removing Images at full speed Images at full speed

Add Remove Query

2

Training and Addition are Separate Training and Addition are Separate

System Overview System Overview

Maximally Stable Extremal Extremal Regions Regions ( (MSERs MSERs) feature extractor. ) feature extractor.

SIFT feature descriptor

Feature space is quantized through k-

means clustering and build into a vocabulary tree. vocabulary tree.

To retrieve images, a hierarchical scoring scheme is used based on Term Frequency scheme is used based on Term Frequency Inverse Document Frequency (TF Inverse Document Frequency (TF-

IDF).

3

4

5

6

Definition of Scoring Definition of Scoring

N N w ln =

w m d w n q = = d d q q d q s − = ) , (

Implementation of Scoring Implementation of Scoring

Every node is associated with an inverted file, although only leaf nodes are explicitly although only leaf nodes are explicitly

the leaf nodes. the leaf nodes.

Inverted files store the id-

numbers of the images in which a particular node occurs, and images in which a particular node occurs, and the term frequency for that image. the term frequency for that image.

The vectors representing the database images as well as the query images are normalized to as well as the query images are normalized to unit magnitude. unit magnitude.

Normalization Normalization

To compute the normalized difference in Lp Lp-

norm:

∑ ∑

− − − + = − − = −

) ( 2

d q d q d q d q d q

For the case of the L2-

norm:

∑

− = −

2 2

q d q

(5) (6) (7)

Testing Testing

Ground truth database consisted of database consisted of 6376 images in 6376 images in groups of four. groups of four.

The database was queried with every queried with every image and was image and was evaluated on how evaluated on how frequently the other frequently the other three images are three images are found perfectly. found perfectly.

Results for only 1400 images Results for only 1400 images

7 Results for only 1400 images Results for only 1400 images Results for only 1400 images Results for only 1400 images Results with full 6376 image Results with full 6376 image database database Other Tests Other Tests – – 40000 CD covers 40000 CD covers

Method was tested on a tested on a database of database of 40000 CD 40000 CD covers covers running real running real-

time.

Other Tests Other Tests – – 1 million images 1 million images

Queries on a 8GB machine would take about 1

Other Tests Other Tests – – 1 million images 1 million images

8

Other Tests Other Tests – – Non movie images Non movie images queried on 300K frames queried on 300K frames

Conclusion Conclusion

This methodology provides the abililty abililty to to make fast searches on extremely large make fast searches on extremely large databases. databases.

Paves the way to someday create an internet internet-