Introduction to Visual Search and Recognition Visual Search - - PDF document

introduction to visual search and recognition
SMART_READER_LITE
LIVE PREVIEW

Introduction to Visual Search and Recognition Visual Search - - PDF document

Perceptual and Sensory Augmented Computing Introduction to Visual Search and Recognition Visual Search Tutorial Global representations: limitations Success may rely on alignment -> sensitive to viewpoint Perceptual and Sensory Augmented


slide-1
SLIDE 1

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Introduction to Visual Search and Recognition

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Global representations: limitations

  • Success may rely on alignment
  • > sensitive to viewpoint
  • All parts of the image or window impact the description
  • > sensitive to occlusion, clutter

2

slide-2
SLIDE 2

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Local representations

  • Describe component regions or patches separately.
  • Many options for detection & description…

3

Superpixels [Ren et al.] Shape context [Belongie 02] Maximally Stable Extremal Regions [Matas 02] Geometric Blur [Berg 05] SIFT [Lowe 99] Salient regions [Kadir 01] Harris-Affine [Mikolajczyk 04] Spin images [Johnson 99] Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Recall: Invariant local features

Subset of local feature types designed to be invariant to

  • Scale
  • Translation
  • Rotation
  • Affine transformations
  • Illumination

1) Detect interest points

2) Extract descriptors

x1 x2 … xd y1 y2 … yd

[Mikolaj czyk01, Matas02, Tuytelaars04, Lowe99, Kadir01,… ]

slide-3
SLIDE 3

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Recognition with local feature sets

  • Previously, we saw how to use

local invariant features + a global spatial model to recognize specific objects, using a planar

  • bject assumption.
  • Now, we’ll use local features for
  • Indexing-based recognition
  • Bags of words representations
  • Correspondence / matching kernels

5

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Basic flow

6

Detect or sample features Describe features

List of positions, scales,

  • rientations

Associated list of d-dimensional descriptors

Index each one into pool

  • f descriptors from

previously seen images

slide-4
SLIDE 4

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Indexing local features

  • Each patch / region has a descriptor, which is a point in

some high-dimensional feature space (e.g., SIFT)

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Indexing local features

  • When we see close points in feature space, we have

similar descriptors, which indicates similar local content.

Figure credit: A. Zisserman

slide-5
SLIDE 5

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Indexing local features

  • We saw in the previous section how to use voting and

pose clustering to identify objects using local features

9 Figure credit: David Lowe

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Indexing local features

  • With potentially thousands of features per image, and

hundreds to millions of images to search, how to efficiently find those that are relevant to a new image?

  • Low-dimensional descriptors : can use standard efficient

data structures for nearest neighbor search

  • High-dimensional descriptors: approximate nearest

neighbor search methods more practical

  • Inverted file indexing schemes

10

slide-6
SLIDE 6

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Indexing local features: approximate nearest neighbor search

11

Best-Bin First (BBF), a variant of k-d trees that uses priority queue to examine most promising branches first [Beis & Lowe, CVPR 1997] Locality-Sensitive Hashing (LSH), a randomized hashing technique using hash functions that map similar points to the same bin, with high probability [Indyk & Motwani, 1998]

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

  • For text documents,

an efficient way to find all pages on which a word occurs is to use an index…

  • We want to find all

images in which a feature occurs.

  • To use this idea,

we’ll need to map

  • ur features to

“visual words”.

12

  • K. Grauman, B. Leibe

Indexing local features: inverted file index

slide-7
SLIDE 7

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words: main idea

  • Extract some local features from a number of images …

13

e.g., S IFT descriptor space: each point is 128-dimensional

S lide credit: D. Nister

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words: main idea

14 S lide credit: D. Nister

slide-8
SLIDE 8

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words: main idea

15 S lide credit: D. Nister

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words: main idea

16 S lide credit: D. Nister

slide-9
SLIDE 9

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

17 S lide credit: D. Nister

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

18 S lide credit: D. Nister

slide-10
SLIDE 10

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words: main idea

Map high-dimensional descriptors to tokens/words by quantizing the feature space

19

  • Quantize via

clustering, let cluster centers be the prototype “ words”

Descriptor space

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words: main idea

Map high-dimensional descriptors to tokens/words by quantizing the feature space

20

  • Determine which

word to assign to each new image region by finding the closest cluster center.

Descriptor space

slide-11
SLIDE 11

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words

  • Example: each

group of patches belongs to the same visual word

21

Figure from S ivic & Zisserman, ICCV 2003

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words

  • First explored for texture

and material representations

  • Texton = cluster center of

filter responses over collection of images

  • Describe textures and

materials based on distribution of prototypical texture elements.

Leung & Malik 1999; Varma & Zisserman, 2002; Lazebnik, S chmid & Ponce, 2003;

slide-12
SLIDE 12

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual words

23

  • K. Grauman, B. Leibe
  • More recently used for

describing scenes and

  • bjects for the sake of

indexing or classification.

S ivic & Zisserman 2003; Csurka, Bray, Dance, & Fan 2004; many others.

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Inverted file index for images comprised of visual words

Image credit: A. Zisserman

  • K. Grauman, B. Leibe

Word number List of image numbers

slide-13
SLIDE 13

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Bags of visual words

  • Summarize entire image

based on its distribution (histogram) of word

  • ccurrences.
  • Analogous to bag of words

representation commonly used for documents.

25 Image credit: Fei-Fei Li

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Video Google System

  • 1. Collect all words within

query region

  • 2. Inverted file index to find

relevant frames

  • 3. Compare word counts
  • 4. Spatial verification

Sivic & Zisserman, ICCV 2003

  • Demo online at :

http://www.robots.ox.ac.uk/~vgg/ research/vgoogle/index.html

26

Query region Retrieved frames

slide-14
SLIDE 14

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Basic flow

27

Detect or sample features Describe features

List of positions, scales,

  • rientations

Associated list of d-dimensional descriptors

Index each one into pool

  • f descriptors from

previously seen images

Quantize to form bag of words vector for the image

  • r

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Visual vocabulary formation

Issues:

  • Sampling strategy
  • Clustering / quantization algorithm
  • Unsupervised vs. supervised
  • What corpus provides features (universal vocabulary?)
  • Vocabulary size, number of words

28

slide-15
SLIDE 15

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Sampling strategies

29

Image credits: F-F . Li, E. Nowak, J. S ivic

Dense, uniformly S parse, at interest points Randomly Multiple interest

  • perators
  • To find specific, textured obj ects, sparse

sampling from interest points often more reliable.

  • Multiple complementary interest operators
  • ffer more image coverage.
  • For obj ect categorization, dense sampling
  • ffers better coverage.

[S ee Nowak, Jurie & Triggs, ECCV 2006] Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Clustering / quantization methods

  • k-means (typical choice), agglomerative clustering,

mean-shift,…

  • Hierarchical clustering: allows faster insertion / word

assignment while still allowing large vocabularies

  • Vocabulary tree [Nister & Stewenius, CVPR 2006]

30

slide-16
SLIDE 16

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

31

Example: Recognition with Vocabulary Tree

  • Tree construction:

S lide credit: David Nister

[Nister & S tewenius, CVPR’ 06] Perceptual and Sensory Augmented Computing

Visual Search Tutorial

32

Vocabulary Tree

  • Training: Filling the tree

S lide credit: David Nister

[Nister & S tewenius, CVPR’ 06]

slide-17
SLIDE 17

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

33

Vocabulary Tree

  • Training: Filling the tree

S lide credit: David Nister

[Nister & S tewenius, CVPR’ 06] Perceptual and Sensory Augmented Computing

Visual Search Tutorial

34

Vocabulary Tree

  • Training: Filling the tree

S lide credit: David Nister

[Nister & S tewenius, CVPR’ 06]

slide-18
SLIDE 18

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

35

Vocabulary Tree

  • Training: Filling the tree

S lide credit: David Nister

[Nister & S tewenius, CVPR’ 06] Perceptual and Sensory Augmented Computing

Visual Search Tutorial

36

Vocabulary Tree

  • Training: Filling the tree

S lide credit: David Nister

[Nister & S tewenius, CVPR’ 06]

slide-19
SLIDE 19

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

37

Vocabulary Tree

  • Recognition

S lide credit: David Nister

[Nister & S tewenius, CVPR’ 06]

RANS AC verification

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

38

Vocabulary Tree: Performance

  • Evaluated on large databases
  • Indexing with up to 1M images
  • Online recognition for database
  • f 50,000 CD covers
  • Retrieval in ~1s
  • Find experimentally that large

vocabularies can be beneficial for recognition

[Nister & S tewenius, CVPR’ 06]

slide-20
SLIDE 20

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Vocabulary formation

  • Ensembles of trees provide additional robustness

Figure credit: F . Jurie

Moosmann, Jurie, & Triggs 2006; Y eh, Lee, & Darrell 2007; Bosch, Zisserman, & Munoz 2007; …

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Supervised vocabulary formation

  • Recent work considers how to leverage labeled images

when constructing the vocabulary

40

Perronnin, Dance, Csurka, & Bressan, Adapted Vocabularies for Generic Visual Categorization, ECCV 2006.

slide-21
SLIDE 21

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Learning and recognition with bag of words histograms

  • Bag of words representation makes it possible to

describe the unordered point set with a single vector (of fixed dimension across image examples)

  • Provides easy way to use distribution of feature types

with various learning algorithms requiring vector input.

41

Perceptual and Sensory Augmented Computing

Visual Search Tutorial

Bags of words: pros and cons

+ flexible to geometry / deformations / viewpoint + compact summary of image content + provides vector representation for sets + has yielded good recognition results in practice

  • basic model ignores geometry – must verify afterwards,
  • r encode via features
  • background and foreground mixed when bag covers

whole image

  • interest points or sampling: no guarantee to capture
  • bject-level parts

42