CS4495 Computer Vision Introduction to Recognition Aaron Bobick - - PowerPoint PPT Presentation

cs4495 computer vision introduction to recognition
SMART_READER_LITE
LIVE PREVIEW

CS4495 Computer Vision Introduction to Recognition Aaron Bobick - - PowerPoint PPT Presentation

CS4495 Computer Vision Introduction to Recognition Aaron Bobick School of Interactive Computing What does recognition involve? Source: Fei Fei Li, Rob Fergus, Antonio Torralba. Verification: is that a lamp? Detection: are there people?


slide-1
SLIDE 1

CS4495 Computer Vision Introduction to Recognition

Aaron Bobick School of Interactive Computing

slide-2
SLIDE 2
slide-3
SLIDE 3
slide-4
SLIDE 4

Source: Fei‐Fei Li, Rob Fergus, Antonio Torralba.

What does recognition involve?

slide-5
SLIDE 5

Verification: is that a lamp?

slide-6
SLIDE 6

Detection: are there people?

slide-7
SLIDE 7

Identification: is that Potala Palace?

slide-8
SLIDE 8

mountain building tree banner vendor people street lamp

Object categorization

slide-9
SLIDE 9
  • outdoor
  • city

Scene and context categorization

slide-10
SLIDE 10

Instance‐level recognition problem

John’s car

slide-11
SLIDE 11

Generic categorization problem

slide-12
SLIDE 12

Object Categorization

Task: Given a (small) number of training images

  • f a category, recognize a‐priori unknown

instances of that category and assign the correct category label.

  • K. Grauman, B. Leibe
slide-13
SLIDE 13

Object Categorization

Which categories are the best for visual identification?

German shepherd animal dog living being “Fido”

slide-14
SLIDE 14

Visual Object Categories

Basic Level Categories in human categorization

[Rosch 76, Lakoff 87]

  • The highest level at which category members have

similar perceived shape

  • The highest level at which a single mental image

reflects the entire category

slide-15
SLIDE 15

Visual Object Categories

Basic Level Categories in human categorization

[Rosch 76, Lakoff 87]

  • The level at which human subjects are usually fastest

at identifying category members

  • The first level named and understood by children
  • The highest level at which a person uses similar

motor actions for interaction with category members

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

Object Categorization

Which categories are the best for visual identification?

German shepherd animal dog living being “Fido”

slide-19
SLIDE 19

How many object categories are there?

Biederman 1987

slide-20
SLIDE 20
  • K. Grauman, B. Leibe

Other Types of Categories

Functional Categories

e.g. chairs = “something you can sit on”

slide-21
SLIDE 21

Other Types of Categories

Ad‐hoc categories

e.g. “something you can find in an office environment”

  • K. Grauman, B. Leibe
slide-22
SLIDE 22

Words: Why recognition?

  • Recognition a fundamental part of perception
  • e.g., robots, autonomous agents
  • Organize and give access to visual content
  • Connect to information
  • Detect trends and themes
  • Because it is a very human way of thinking about

things…

slide-23
SLIDE 23

http://www.darpa.mil/grandchallenge/gallery.asp

Autonomous agents able to detect objects

slide-24
SLIDE 24

Labeling people

slide-25
SLIDE 25

Posing visual queries

Belhumeur et al.

slide-26
SLIDE 26

Finding visually similar objects

slide-27
SLIDE 27

So why is this hard?

slide-28
SLIDE 28

Illumination Object pose Clutter

Challenges: Robustness

Kristen Grauman

slide-29
SLIDE 29

Occlusions Viewpoint Intra‐class appearance

Challenges: Robustness

slide-30
SLIDE 30

Challenges: Robustness

Realistic scenes are crowded, cluttered, have overlapping objects.

slide-31
SLIDE 31

Challenges: Importance of context

Fei‐Fei, Fergus & Torralba

slide-32
SLIDE 32

Challenges: Importance of context

Fei‐Fei, Fergus & Torralba

slide-33
SLIDE 33

Challenges: complexity

  • Thousands to millions of pixels in an image
  • 3,000‐30,000 human recognizable object

categories

  • 30+ degrees of freedom in the pose of

articulated objects (humans)

Kristen Grauman

slide-34
SLIDE 34

Challenges: complexity

  • Billions of images indexed by Google Image Search
  • In 2011, 6 billion photos uploaded per month
  • Approx one billion million camera phones sold in

2013

  • About half of the cerebral cortex in primates is

devoted to processing visual information [Felleman and van Essen 1991]

Kristen Grauman

slide-35
SLIDE 35

So what works?

slide-36
SLIDE 36

What worked most reliably “yesterday”

  • Reading license plates (real easy), zip codes,

checks

Lana Lazebnik

slide-37
SLIDE 37

What worked most reliably “yesterday”

  • Reading license plates, zip codes, checks
  • Fingerprint recognition

Lana Lazebnik

slide-38
SLIDE 38

What worked most reliably “yesterday”

  • Reading license plates, zip codes, checks
  • Fingerprint recognition
  • Face detection

(Today recognition)

slide-39
SLIDE 39

What worked most reliably “yesterday”

  • Reading license plates, zip codes, checks
  • Fingerprint recognition
  • Face detection

(Today recognition)

  • Recognition of flat

textured objects

(CD covers, book covers, etc.)

Lana Lazebnik

slide-40
SLIDE 40

Just in: GoogleNet 2014

slide-41
SLIDE 41

Just in: GoogleNet – no context needed?

slide-42
SLIDE 42
slide-43
SLIDE 43

Supervised classification

Given a collection of labeled examples, come up with a function that will predict the labels of new examples.

“four” “nine” ? Training examples Novel input

Kristen Grauman

slide-44
SLIDE 44

Supervised classification

How good is the function we come up with to do the classification? (What does “good” mean?) Depends on:

  • What mistakes does it make
  • Cost associated with the mistakes

Kristen Grauman

slide-45
SLIDE 45

Supervised classification

Since we know the desired labels of training data, we want to minimize the expected misclassification

slide-46
SLIDE 46

Supervised classification

Two general strategies

  • Use the training data to build representative

probability model; separately model class‐ conditional densities and priors (Generative)

  • Directly construct a good decision boundary, model

the posterior (Discriminative)

slide-47
SLIDE 47

Supervised classification: Generative

Given labeled training examples, predict labels for new examples

  • Notation:

) ‐ object is a ‘4’ but you call it a ‘9’

  • We’ll assume the cost of

is zero.

Kristen Grauman

slide-48
SLIDE 48

Supervised classification: Generative

Consider the two‐class (binary) decision problem:

  • : Loss of classifying a 4 as a 9
  • : Loss of classifying a 9 as a 4

Kristen Grauman

slide-49
SLIDE 49

Supervised classification: Generative

Risk of a classifier strategy S is expected loss: We want to choose a classifier so as to minimize this total risk

       

(S) Pr 4 9| using S 4 9 Pr 9 4| using S 9 4 R L L      

Kristen Grauman

slide-50
SLIDE 50

Supervised classification: minimal risk

Feature value At best decision boundary, either choice of label yields same expected loss.

If we choose class “four” at boundary, expected loss is: If we choose class “nine” at boundary, expected loss is:

(class is 9| ) (class is (9 4) (4 4) ( 4 | ) (class is 9| 9 4) ) L L L P P P       x x x

(class is ( 4| ) ) 4 9 L P   x

Kristen Grauman

slide-51
SLIDE 51

Supervised classification: minimal risk

So, best decision boundary is at point x where: To classify a new point, choose class with lowest expected loss; i.e., choose “four” if:

(9 4) (4 (class is 9| ) P(class is 9 ) ) 4| P L L    x x

(4 9 (4 | ) (9 | ) ) (9 4) P P L L    x x

Kristen Grauman

Feature value At best decision boundary, either choice of label yields same expected loss.

slide-52
SLIDE 52

Supervised classification: minimal risk

Kristen Grauman Feature value P(4 | x) L(4→9) P(9 | x) L(9→4)

How to evaluate these probabilities? At best decision boundary, either choice of label yields same expected loss.

So, best decision boundary is at point x where:

(9 4 P(class is 9| ) P(class is 4| ) (4 ) ) 9 L L    x x

slide-53
SLIDE 53

Example: learning skin colors

Percentage of skin pixels in each bin Kristen Grauman P(x| not skin) P(x| skin)

Feature x = Hue

slide-54
SLIDE 54

Example: learning skin colors

Kristen Grauman

Now we get a new image, and want to label each pixel as skin or non‐skin.

Percentage of skin pixels in each bin P(x| not skin) P(x| skin)

Feature x = Hue

slide-55
SLIDE 55

Bayes rule

posterior prior likelihood

Where does the prior come from?

P(x| skin)

slide-56
SLIDE 56

Bayes rule in (ab)use

Likelihood ratio test (assuming cost of errors is the same):

If > classify x as skin … so …. If classify as skin (Bayes rule) (if the costs are different just re‐weight)

slide-57
SLIDE 57

Bayes rule in (ab)use

… but I don’t really know prior … … but I can assume it some constant Ω … … so with some training data I can estimate Ω … …. and with the same training data I can measure the likelihood densities of both and So…. I can more or less come up with a rule…

Steve Seitz

slide-58
SLIDE 58

Example: classifying skin pixels

Now for every pixel in a new image, we can estimate probability that it is generated by skin: If classify as skin; otherwise not

Brighter pixels are higher probability

  • f being skin

Kristen Grauman

slide-59
SLIDE 59

Example: classifying skin pixels

Gary Bradski, 1998

slide-60
SLIDE 60

Gary Bradski, 1998

Example: classifying skin pixels

slide-61
SLIDE 61

More general generative models

For a given measurement and set of classes choose ∗ by:

*

argmax ( | ) argmax ( ) ( | )

c c

p p c c c p c   x x

slide-62
SLIDE 62

Continuous generative models

  • If x is continuous, need likelihood density model of p(x|c)
  • Typically parametric – Gaussian or mixture of Gaussians

Gaussian Mixture of Gaussians

slide-63
SLIDE 63

Continuous generative models

  • Why not just some histogram or some KNN

(Parzen window) method?

  • You might…
  • But you would need lots and lots of data

everywhere you might get a point

  • The whole point of modeling with a parameterized

model is not to need lots of data.

slide-64
SLIDE 64

Summary of generative models:

+ Firm probabilistic grounding + Allows inclusion of prior knowledge + Parametric modeling of likelihood permits using small number of examples + New classes do not perturb previous models + Others:

Can take advantage of unlabelled data Can be used to generate samples

slide-65
SLIDE 65

Summary of generative models:

‐ And just where did you get those priors? ‐ Why are you modeling those obviously non‐C points? ‐ The example hard cases aren’t special ‐ If you have lots of data, doesn’t help

slide-66
SLIDE 66

Next time…

  • A really cool way of building a generative

model for face recognition (not detection)

  • And then discriminative models…