Instance-level Recognition Pingmei Xu Object Recognition Friends - - PowerPoint PPT Presentation

instance level recognition
SMART_READER_LITE
LIVE PREVIEW

Instance-level Recognition Pingmei Xu Object Recognition Friends - - PowerPoint PPT Presentation

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the Ring! Friends SE01EP02 Recognition: Find the Ring! Instance


slide-1
SLIDE 1

Instance-level Recognition

Pingmei Xu

slide-2
SLIDE 2

Object Recognition

Friends SE01EP02

slide-3
SLIDE 3

Recognition: Find the Ring!

Friends SE01EP02

slide-4
SLIDE 4

Recognition: Find the Ring!

Friends SE01EP02

Instance Recognition

slide-5
SLIDE 5

Recognition: Find the Ring!

Friends SE01EP02

Category Recognition

slide-6
SLIDE 6

Recognition: Find the Ring!

Friends SE01EP02

Recognition Algorithm

slide-7
SLIDE 7

Recognition: Find the Ring!

Friends SE01EP02

Scene Understanding

slide-8
SLIDE 8

History of Ideas in Recognition

Some Slides are borrowed from Svetlana Lazebnik

slide-9
SLIDE 9

The Religious Wars

  • …and the standard answer: probably both or neither

Alexei Efros

Geometry Appearance Parts The Whole vs. vs.

slide-10
SLIDE 10

1960s ~ late 1990s

the Geometric Era

slide-11
SLIDE 11

Blocks World (1960s)

  • Constrained 3D scene models to allow object recognition

from very simple image features.

L.G. Roberts

slide-12
SLIDE 12

Generalized Cylinders (1970s)

  • Representing 3D shapes and parts in terms of

“Generalized Cylinders”.

  • T. Binford
slide-13
SLIDE 13

Recognition by Parts

  • Geons: shape primitives + deformations, with predictable

edge properties under perspective.

Biederman (1987) Pentland (1986)

slide-14
SLIDE 14

Recognition by Parts

  • Hypothesis: there is a small number of geometric components

that constitute the primitive elements of the object recognition system.

Biederman (1987)

Primitives (geons) Objects

slide-15
SLIDE 15

Parts + Spatial Configurations

  • There is more to shape than just the right part primitives,

i.e., their spatial relationships.

Fischler & Elschlager (1973)

slide-16
SLIDE 16

Alignment

Huttenlocher & Ullman (1987)

slide-17
SLIDE 17

1990s

Appearance-Based

slide-18
SLIDE 18

Eigenfaces

Turk & Pentland (1991)

slide-19
SLIDE 19

Color Histogram

Swain & Ballard (1991)

slide-20
SLIDE 20

1990s ~ present

Sliding Window

slide-21
SLIDE 21

Sliding Window Approaches

Viola & Jones (2000) Dalal & Triggs (2005)

HOG feature map Template Response map

slide-22
SLIDE 22

late 1990s ~ present

Local Features

slide-23
SLIDE 23

Local Features

  • D. Lowe (1999, 2004)
slide-24
SLIDE 24

early 2000s ~ present

Parts and Shape

slide-25
SLIDE 25

Constellation Models

Weber, Welling & Perona (2000), Fergus, Perona & Zisserman (2003)

slide-26
SLIDE 26

Deformable Part Model

Felzenszwalb, Girshick, McAllester & Ramanan (2009)

slide-27
SLIDE 27

mid 2000s ~ present

Bag of Features

slide-28
SLIDE 28

Bag-of-features Models

Objects Bag of “words”

slide-29
SLIDE 29

Present Trends

slide-30
SLIDE 30

Data-driven Method

Malisiewicz et al. (2011)

slide-31
SLIDE 31

Recognition from RGBD Images

Shotton et al. (2011)

slide-32
SLIDE 32

Deep Learning

http://deeplearning.net/

slide-33
SLIDE 33

History of Ideas in Recognition

  • 1960s – late 1990s: the geometric era
  • 1990s: appearance-based models
  • 1990s – present: sliding window approaches
  • late 1990s – present: local features
  • early 2000s – present: parts-and-shape models
  • mid 2000s – present: bags of features
  • present trends: “big data”, context, attributes, combining

geometry and recognition, advanced scene understanding tasks, deep learning

slide-34
SLIDE 34

History of Ideas in Recognition

?

THE ¡COMPUTER ¡VISION ¡EVOLUTION

slide-35
SLIDE 35

3D Object Modeling and Recognition

A test image Instances of 5 models

Rothganger, Lazebnik, Schmid, & Ponce (2006)

slide-36
SLIDE 36

3D Modeling: Pairwise Matching

slide-37
SLIDE 37

Affine Patches

  • Idea: although smooth surfaces are almost never planar

in the large, they are always planar in the small.

STEP 1: Detection Detect salient image regions STEP 2: Description Extract a descriptor

slide-38
SLIDE 38

Affine Patches

Harris-Laplacian DoG

slide-39
SLIDE 39

Affine Patches

slide-40
SLIDE 40

Affine Patches

slide-41
SLIDE 41

Affine Patches

slide-42
SLIDE 42

Affine Patches

slide-43
SLIDE 43

Affine Patches

  • Patch rectification
slide-44
SLIDE 44

Geometric Constraints

slide-45
SLIDE 45

3D Object Modeling: Matching Procedure

RANSAC: 1) sampling stage 2) consensus stage

slide-46
SLIDE 46

Application: 3D Object Modeling

slide-47
SLIDE 47

3D Modeling: Input Images

  • The 20 images used to construct the teddy bear model.
slide-48
SLIDE 48

3D Modeling: Partial Model from Image Pairs

  • Matches between two images.
slide-49
SLIDE 49

3D Modeling: Partial Model→Composite Ones

Covert a collection of matches to a 3D model

  • 1. Chaining
  • 2. Stitching
  • 3. Bundle adjustment
  • 4. Euclidean upgrade
slide-50
SLIDE 50

3D Modeling: Partial→Composite: Chaining

Chaining: link matches across multiple images.

  • Construction of the patch-view matrix.

A (subsampled) patch-view matrix for the teddy bear. Each black square indicates the presence of a given patch in a given image.

slide-51
SLIDE 51

3D Modeling: Partial→Composite: Stitching

Stitching: solve for the affine structure and motion while coping with missing data.

Common patches of adjacent modeling views presented in a common coordinate frame.

slide-52
SLIDE 52

3D Modeling: Partial→Composite: Bundle Adjustment

Bundle adjustment: refine the model using non-linear least squares.

The bear model along with the recovered affine camera configurations.

slide-53
SLIDE 53

3D Modeling: Object Gallery

slide-54
SLIDE 54

Application: 3D Object Recognition

slide-55
SLIDE 55

3D Object Recognition: Select Potential Matches

Features: 1) a measure of the contrast (average squared gradient norm) in the patch 2) a 10 × 10 color histogram drawn from the UV portion of YUV space, and 3) SIFT

slide-56
SLIDE 56

3D Object Recognition: Robust Estimation

  • RANSAC
  • Greedy

Here |P| denotes the size of the set P of match hypotheses, K is the number of best matches kept per model patch, M is the number of samples drawn, and N is the size of one seed.

slide-57
SLIDE 57

3D Object Recognition: Object Detection

  • Criteria used to decide whether it is present or not:
  • Measure of distortion: reflects how close to the top part of

a scaled rotation this matrix is.

slide-58
SLIDE 58

3D Object Recognition: Successful Examples

slide-59
SLIDE 59

3D Object Recognition: Failed Examples

slide-60
SLIDE 60

3D Object Modeling and Recognition

  • Paper: 3D Object Modeling and Recognition Using Local Affine-Invariant

Image Descriptors and Multi-View Spatial Constraints. F. Rothganger, S. Lazebnik, C. Schmid, and J. Ponce, IJCV 2006

Input Pairwise Matching 3D Modeling Test Image 3D Recognition

slide-61
SLIDE 61

Large Scale Retrieval

slide-62
SLIDE 62

Large Scale Image Retrieval

  • Combining local features, indexing, and spatial

constraints

  • K. Grauman and B. Leibe
slide-63
SLIDE 63

Video Google (VG)

  • Whether text retrieval approach is applicable to object

recognition?

Query Search Results

J Sivic et al. (2003), Philbin et al. (2007, 2008), Chum et al. (2007)

slide-64
SLIDE 64

Text Retrieval

Word Stem Document Corpus

slide-65
SLIDE 65

The Visual Analogy

? ? Frame/Image Film/Image Set

slide-66
SLIDE 66

VG:Local Descriptor

  • Viewpoint covariant regions:

1) ’Maximally Stable’ (yellow) 2) ’Shape Adapted’ (cyan)

  • 128-dimensional SIFT
slide-67
SLIDE 67

The Visual Analogy

Descriptor

? Frame/Image Film/Image Set

slide-68
SLIDE 68

VG: Visual Vocabulary

  • Vector quantize the descriptors into clusters by k-means.

Affine covariant regions Clusters

slide-69
SLIDE 69
  • Each group of patches belongs to the same visual word.

VG: Visual Words

slide-70
SLIDE 70

The Visual Analogy

Descriptor

Centroid Frame/Image Film/Image Set

slide-71
SLIDE 71

Image Retrieval Using Visual Words

  • Vocabulary construction (offline)
  • Database construction (offline)
  • Image retrieval (online)
slide-72
SLIDE 72

VG: Stop List

  • The most frequent visual words that occur in almost all

images are suppressed.

Before stop list→ After stop list→

slide-73
SLIDE 73

VG: Soft Assignment

  • Count in one bin is spread to neighbouring bins.
slide-74
SLIDE 74

Vocabulary Construction Summary

Subset of 48 shots is selected Select regions (SA+MS) Frame tracking Reject unstable regions SIFT descriptors Cluster descriptors using K-means Parameters tuning

10k frames= 10% of movie 10k frames*1600 =1,600,000 regions ~200k regions

slide-75
SLIDE 75

Image Retrieval Using Visual Words

  • Vocabulary construction (offline)
  • Database construction (offline)
  • Image retrieval (online)
slide-76
SLIDE 76

tf-idf Vector

  • Documents -> vectors of word frequencies
  • Term frequency – inverse document frequency
  • Downweight words that appear often in the database

Number of

  • ccurrences of word i

in document d Number of words in document d Total number of documents in database Total number of word i in database

slide-77
SLIDE 77

VG: Inverted File Index

  • Word -> a list of all documents (with frequencies)

Word ID 1 2 3 4 … N Document ID 1 2 3 4 … K

slide-78
SLIDE 78

Crawling Movies Summary

Select key frames Select regions (SA+MS) Frame tracking Reject unstable regions SIFT descriptors Vector quantization Stop list Tf-idf weighting Inverted index

Vocabulary Construction Database Construction

slide-79
SLIDE 79

Image Retrieval Using Visual Words

  • Vocabulary construction (offline)
  • Database construction (offline)
  • Image retrieval (online)
  • Building query vector
  • Ranking results
slide-80
SLIDE 80

Querying

word id image id 1 8 2 1,17 3 12 4 3,10 5 6,9 6 1,4,5,7 7 9,16 … N

Query Image Database Invert File

w2 w7

w2 w2 w9

w20

w4 w6 w5

w22

1 2 3 17 …

slide-81
SLIDE 81

VG: Rerank by Spatial Consistency

  • Neighbouring matches in the query region lie in a

surrounding area in the retrieved frame.

Before → After →

slide-82
SLIDE 82

VG: Query Expansion

Query image New query Results Spacial verification New results

Ondrej Chum

slide-83
SLIDE 83

“Google Like” Object Query

Compute descriptors Build query vector Use inverse index to find relevant frames Calculate distance to relevant frames Rank results Re-rank by spacial consistency Query expansion (optional)

slide-84
SLIDE 84

Large Scale Image Retrieval

  • Vocabulary construction (offline)
  • extract affine covariant regions
  • compute descriptor
  • cluster the descriptors into visual words (k-means)
  • Database construction (offline)
  • tf-idf vectors for each document
  • inverted indices from visual words to images
  • Image retrieval (online)
  • extract visual words and compute a tf-idf vector for the query image
  • retrieve the top image candidates
  • (optional) re-rank matches using spatial consistency
  • (optional) expand the answer set by re-submitting highly ranked matches

Richard Szeliski, 2010

slide-85
SLIDE 85

Instance Recognition Application

slide-86
SLIDE 86

Planar Object Recognition

  • Recognition of flat textured objects (CD covers, book

covers, etc.)

slide-87
SLIDE 87

Fingerprint Recognition

slide-88
SLIDE 88

Automatic Discovery of Landmarks

Quack, Leibe & VanGool (2008)

slide-89
SLIDE 89

Cross-domain Image Matching

Shrivastava et al. (2011)

slide-90
SLIDE 90

Summary

  • History of ideas in Recognition
  • 3D object modeling and recognition
  • Large scale image retrieval
  • Instance recognition application
slide-91
SLIDE 91

Thank You