Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie - - PowerPoint PPT Presentation

object recognition
SMART_READER_LITE
LIVE PREVIEW

Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie - - PowerPoint PPT Presentation

Henderson and Davis. Shape recognition using hierarchical Constraint Analysis. 1979 Object Recognition 16-385 Computer Vision (Kris Kitani) Carnegie Mellon University What do we mean by object recognition? Is this a street light?


slide-1
SLIDE 1

Object Recognition

16-385 Computer Vision (Kris Kitani)

Carnegie Mellon University

Henderson and Davis. Shape recognition using hierarchical Constraint Analysis. 1979

slide-2
SLIDE 2

What do we mean by ‘object recognition’?

slide-3
SLIDE 3

Is this a street light? (Verification / classification)

slide-4
SLIDE 4

Where are the people? (Detection)

slide-5
SLIDE 5

Is that Potala palace? (Identification)

slide-6
SLIDE 6

What’s in the scene? (semantic segmentation)

Building Mountain Trees Vendors People Ground Sky

slide-7
SLIDE 7

What type of scene is it? (Scene categorization)

Outdoor City Marketplace

slide-8
SLIDE 8

Challenges

(Object Recognition)

slide-9
SLIDE 9

Viewpoint variation

slide-10
SLIDE 10

Illumination variation

slide-11
SLIDE 11

Scale variation

slide-12
SLIDE 12

Background clutter

slide-13
SLIDE 13

Deformation

slide-14
SLIDE 14

Occlusion

slide-15
SLIDE 15

Intra-class variation

slide-16
SLIDE 16

Common approaches

slide-17
SLIDE 17

Spatial reasoning Window classification Feature Matching

Common approaches:

  • bject recognition
slide-18
SLIDE 18

Feature matching

slide-19
SLIDE 19

What object do these parts belong to?

slide-20
SLIDE 20

a collection of local features

(bag-of-features)

An object as

Some local feature are very informative Are the positions of the parts important?

  • deals well with occlusion
  • scale invariant
  • rotation invariant
slide-21
SLIDE 21

Pros

  • Simple
  • Efficient algorithms
  • Robust to deformations

Cons

  • No spatial reasoning
slide-22
SLIDE 22

Spatial reasoning Window classification Feature Matching

Common approaches:

  • bject recognition
slide-23
SLIDE 23

Spatial reasoning

slide-24
SLIDE 24

The position of every part depends on the positions of all the other parts

p

  • s

i t i

  • n

a l d e p e n d e n c e

Many parts, many dependencies!

slide-25
SLIDE 25
  • 1. Extract features
  • 2. Match features
  • 3. Spatial verification
slide-26
SLIDE 26
  • 1. Extract features
  • 2. Match features
  • 3. Spatial verification
slide-27
SLIDE 27
  • 1. Extract features
  • 2. Match features
  • 3. Spatial verification

an old idea…

slide-28
SLIDE 28

Fu and Booth. Grammatical Inference. 1975 Structural (grammatical) description Scene

slide-29
SLIDE 29
slide-30
SLIDE 30

1972 Description for left edge of face

slide-31
SLIDE 31

vector of RVs: 
 set of part locations L = {L1, L2, . . . , LM} RV

A more modern probabilistic approach… think of locations as random variables (RV)

RV RV

slide-32
SLIDE 32

vector of RVs: 
 set of part locations L = {L1, L2, . . . , LM}

What are the dimensions of R.V. L? How many possible combinations of part locations?

RV

A more modern probabilistic approach… think of locations as random variables (RV)

RV RV

L1

L2 LM

image (N pixels)

slide-33
SLIDE 33

vector of RVs: 
 set of part locations L = {L1, L2, . . . , LM}

What are the dimensions of R.V. L? How many possible combinations of part locations?

RV

A more modern probabilistic approach… think of locations as random variables (RV)

RV RV

L1

L2 LM

image

Lm = [ x y ]

slide-34
SLIDE 34

vector of RVs: 
 set of part locations L = {L1, L2, . . . , LM}

What are the dimensions of R.V. L? How many possible combinations of part locations?

N M

RV

A more modern probabilistic approach… think of locations as random variables (RV)

RV RV

Lm = [ x y ]

L1

L2 LM

image

slide-35
SLIDE 35

Most likely set of locations L is found by maximizing:

Likelihood: 
 How likely it is to observe image I given that the M parts are at locations L
 (scaled output of a classifier) Prior: 
 spatial prior controls the geometric configuration of the parts

What kind of prior can we formulate?

p(L|I) ∝ p(I|L)p(L)

part locations image Posterior

slide-36
SLIDE 36

Given any collection of selfie images, where would you expect the nose to be?

P(Lnose) =?

What would be an appropriate prior?

slide-37
SLIDE 37

A simple factorized model

Break up the joint probability into smaller (independent) terms

p(L) = Y

m

p(Lm)

slide-38
SLIDE 38

Independent locations

Each feature is allowed to move independently Does not model the relative location of parts at all

p(L) = Y

m

p(Lm)

slide-39
SLIDE 39

Tree structure

(star model)

Represent the location of all the parts relative to a single reference part Assumes that one reference part is defined 
 (who will decide this?)

Root (reference) node

p(L) = p(Lroot)

M−1

Y

m=1

p(Lm|Lroot)

slide-40
SLIDE 40

Fully connected

(constellation model)

Explicitly represents the joint distribution of locations Good model: Models relative location of parts BUT Intractable for moderate number of parts

p

  • s

i t i

  • n

a l d e p e n d e n c e

p(L) = p(l1, . . . , lN)

slide-41
SLIDE 41

Pros

  • Retains spatial constraints
  • Robust to deformations

Cons

  • Computationally expensive
  • Generalization to large inter-class variation (e.g.,

modeling chairs)

slide-42
SLIDE 42

Spatial reasoning Window classification Feature Matching

slide-43
SLIDE 43

Window-based

slide-44
SLIDE 44
  • 1. get image window
  • 2. extract features
  • 3. classify

When does this work and when does it fail? How many templates do you need?

Template Matching

slide-45
SLIDE 45

Per-exemplar

find the ‘nearest’ exemplar, inherit its label

exemplar template top hits from test data

slide-46
SLIDE 46

1. get image window
 (or region proposals)

  • 2. extract features
  • 3. compare to template

Template Matching

Do this part with one big classifier ‘end to end learning’

slide-47
SLIDE 47

Convolutional 
 Neural Networks

Convolution Pooling

Image patch (raw pixels values)

response of one ‘filter’ max/min response over a region

A 96 x 96 image convolved with 400 filters (features) of size 8 x 8 generates about 3 million values (892x400) Pooling aggregates statistics and lowers the dimension of convolution Image patch (raw pixels values)

response of one ‘filter’

slide-48
SLIDE 48
slide-49
SLIDE 49

630 million connections 60 millions parameters to learn

Krizhevsky, A., Sutskever, I. and Hinton, G. E. ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012.

224/4=56 96 ‘filters’

slide-50
SLIDE 50

Pros

  • Retains spatial constraints
  • Efficient test time performance

Cons

  • Many many possible windows to evaluate
  • Requires large amounts of data
  • Sometimes (very) slow to train
slide-51
SLIDE 51

How to write an effective CV resume

slide-52
SLIDE 52

Deep Learning

+1-DEEP-LEARNING deeplearning@deeplearning http://deeplearning Summary: Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Experience: Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Education Deep Learning Deep Learning ? Deep Learning Deep Learning Deep Learning Experience Deep Learning Deep Learning . Deep Learning Deep Learning, Deep Learning · Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning · Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning · Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning in another country Deep Learning Deep Learning , Deep Learning , Deep Learning · Deep Learning ... wait.. Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning · Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning Deep Learning · Very Deep Learning Publications

  • 1. Deep Learning in Deep Learning

People who do Deep Learning things. Conference of Deep Learning.

  • 2. Shallow Learning... Nawww.. Deep Learning bruh Under submission while Deep Learning

Patent

  • 1. System and Method for Deep Learning. Deep Learning, Deep Learning , Deep Learning , Deep

Learning