CS 4495 Computer Vision Classification 2 Aaron Bobick School of - - PowerPoint PPT Presentation

cs 4495 computer vision classification 2
SMART_READER_LITE
LIVE PREVIEW

CS 4495 Computer Vision Classification 2 Aaron Bobick School of - - PowerPoint PPT Presentation

Classifcation2 CS 4495 Computer Vision A. Bobick CS 4495 Computer Vision Classification 2 Aaron Bobick School of Interactive Computing Classifcation2 CS 4495 Computer Vision A. Bobick Administrivia PS5 (still) due on Friday Nov


slide-1
SLIDE 1

Classifcation2 CS 4495 Computer Vision – A. Bobick

Aaron Bobick School of Interactive Computing

CS 4495 Computer Vision Classification 2

slide-2
SLIDE 2

Classifcation2 CS 4495 Computer Vision – A. Bobick

Administrivia

  • PS5 (still) due on Friday Nov 14, 11:55pm
  • Remember, MSCS and undergrad are on different curves
  • We will not be unfair
  • Hopfully PS6 out Friday Nov 14, due Nov 23rd
  • Reminder: Problem set resubmission policy:
  • Full questions only
  • Be email to me and the TAs.
  • You get 50% credit to replace whatever you got last time on that

question.

  • Must be submitted by: DEC 1. NO EXCEPTIONS.
slide-3
SLIDE 3

Classifcation2 CS 4495 Computer Vision – A. Bobick

Last time: Supervised classification

  • Given a collection of labeled examples, come up with a

function that will predict the labels of new examples.

“four” “nine”

?

Training examples Novel input

Kristen Grauman

slide-4
SLIDE 4

Classifcation2 CS 4495 Computer Vision – A. Bobick

Supervised classification

  • Since we know the desired labels of training data,

we want to minimize the expected misclassification

  • Two general strategies
  • Use the training data to build representative probability

model; separately model class-conditional densities and priors (Generative)

  • Directly construct a good decision boundary, model the

posterior (Discriminative)

slide-5
SLIDE 5

Classifcation2 CS 4495 Computer Vision – A. Bobick

Supervised classification: minimal risk

Feature value x

Optimal classifier will minimize total risk. At decision boundary, either choice of label yields same expected loss. So, best decision boundary is at point x where To classify a new point, choose class with lowest expected loss; i.e., choose “four” if

9) (4 ) | 4 is P(class 4) (9 ) | 9 is class ( → = → L L P x x

) 4 9 ( ) | 9 ( ) 9 4 ( ) | 4 ( → > → L P L P x x

Kristen Grauman

slide-6
SLIDE 6

Classifcation2 CS 4495 Computer Vision – A. Bobick

Supervised classification: minimal risk

Feature value x

Optimal classifier will minimize total risk. At decision boundary, either choice of label yields same expected loss. So, best decision boundary is at point x where To classify a new point, choose class with lowest expected loss; i.e., choose “four” if

9) (4 ) | 4 is P(class 4) (9 ) | 9 is class ( → = → L L P x x

) 4 9 ( ) | 9 ( ) 9 4 ( ) | 4 ( → > → L P L P x x

How to evaluate these probabilities?

P(4 | x) L(4→9) P(9 | x) L(9→4)

Kristen Grauman

slide-7
SLIDE 7

Classifcation2 CS 4495 Computer Vision – A. Bobick

Example: learning skin colors

  • We can represent a class-conditional likelihood density

using a histogram (a “non-parametric” distribution)

Feature x = Hue P(x|skin) Feature x = Hue P(x|not skin)

Percentage of skin pixels in each bin

Kristen Grauman

slide-8
SLIDE 8

Classifcation2 CS 4495 Computer Vision – A. Bobick

Bayes rule

) ( ) ( ) | ( ) | ( x P skin P skin x P x skin P =

posterior prior likelihood

) ( ) | ( ) | ( skin P skin x P x skin P α

Where does the prior come from?

Why even use a prior?

Source: Steve Seitz

Feature x = Hue P(x|skin)

slide-9
SLIDE 9

Classifcation2 CS 4495 Computer Vision – A. Bobick

Example: classifying skin pixels

Now for every pixel in a new image, we can estimate probability that it is generated by skin. Classify pixels based on these probabilities

Brighter pixels  higher probability

  • f being skin

Kristen Grauman

slide-10
SLIDE 10

Classifcation2 CS 4495 Computer Vision – A. Bobick

Some challenges for generative models

Generative approaches were some of the first methods in pattern recognition.

  • Easy to model analytically and could be done with modest amounts
  • f moderate dimensional data.
slide-11
SLIDE 11

Classifcation2 CS 4495 Computer Vision – A. Bobick

Some challenges for generative models

But for the modern world there are some liabilities:

  • Many signals are high-dimensional and

representing the complete density of class is data-hard.

  • In some sense, we don’t care about

modeling the classes, we only care about making the right decisions.

  • Model the hard cases- the ones near the

boundaries!!

  • We don’t typically know which features of

instances actually discriminate between classes.

slide-12
SLIDE 12

Classifcation2 CS 4495 Computer Vision – A. Bobick

So…

  • If we have only a fixed number of label types…
  • If what matters is getting the label right…
  • If we’re not sure which features/properties are even

important in defining the classes … then …

  • We want to focus on discriminating between the class

types.

  • We want the machine to somehow learn the features that

matter.

  • This gets us to discriminative classification.
slide-13
SLIDE 13

Classifcation2 CS 4495 Computer Vision – A. Bobick

Discriminative methods: assumptions

Going forward we’re gong to make some assumptions

  • There are a fixed number of known classes.
  • Ample number of training examples of each

class.

  • Equal cost of making mistakes - what matters

is getting the label right.

  • Need to construct a representation of the

instance but we don’t know a priori what features are diagnostic of the class label.

slide-14
SLIDE 14

Classifcation2 CS 4495 Computer Vision – A. Bobick

Generic category recognition: basic framework

Train

  • Build an object model – a representation

Describe training instances (here images)

  • Learn/train a classifier

Test

  • Generate candidates in new image
  • Score the candidates
slide-15
SLIDE 15

Classifcation2 CS 4495 Computer Vision – A. Bobick

Simple holistic descriptions of image content

  • grayscale / color histogram
  • vector of pixel intensities

Kristen Grauman

Window-based models

slide-16
SLIDE 16

Classifcation2 CS 4495 Computer Vision – A. Bobick

  • Pixel-based representations sensitive to small shifts
  • Color or grayscale-based appearance description can be

sensitive to illumination and intra-class appearance variation

Kristen Grauman

Window-based models

slide-17
SLIDE 17

Classifcation2 CS 4495 Computer Vision – A. Bobick

  • Consider edges, contours, and (oriented) intensity

gradients

Kristen Grauman

Window-based models

slide-18
SLIDE 18

Classifcation2 CS 4495 Computer Vision – A. Bobick

  • Consider edges, contours, and (oriented) intensity

gradients

  • Summarize local distribution of gradients with histogram
  • Locally orderless: offers invariance to small shifts and rotations
  • Contrast-normalization: try to correct for variable illumination

Kristen Grauman

Window-based models

slide-19
SLIDE 19

Classifcation2 CS 4495 Computer Vision – A. Bobick

Generic category recognition: basic framework

Train

  • Build an object model – a representation

Describe training instances (here images)

  • Learn/train a classifier

Test

  • Generate candidates in new image
  • Score the candidates
slide-20
SLIDE 20

Classifcation2 CS 4495 Computer Vision – A. Bobick

Car/non-car Classifier

Yes, car. No, not a car. Given the representation, train a binary classifier

Kristen Grauman

Window-based models

slide-21
SLIDE 21

Classifcation2 CS 4495 Computer Vision – A. Bobick

106 examples

Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines Guyon, Vapnik Heisele, Serre, Poggio, 2001,…

Slide adapted from Antonio Torralba

Boosting Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…

Discriminative classifier construction

Random Forests

Breiman, 1984 Shotton, et al CVPR 2008

……

tree t1 v

slide-22
SLIDE 22

Classifcation2 CS 4495 Computer Vision – A. Bobick

106 examples

Nearest neighbor Shakhnarovich, Viola, Darrell 2003 Berg, Berg, Malik 2005... Neural networks LeCun, Bottou, Bengio, Haffner 1998 Rowley, Baluja, Kanade 1998 … Support Vector Machines Guyon, Vapnik Heisele, Serre, Poggio, 2001,…

Slide adapted from Antonio Torralba

Boosting Viola, Jones 2001, Torralba et al. 2004, Opelt et al. 2006,…

Discriminative classifier construction

Random Forests

Breiman, 1984 Shotton, et al CVPR 2008

……

tree t1 v

slide-23
SLIDE 23

Classifcation2 CS 4495 Computer Vision – A. Bobick

Car/non-car Classifier

Kristen Grauman

Window-based models Generating and scoring candidates

slide-24
SLIDE 24

Classifcation2 CS 4495 Computer Vision – A. Bobick

Car/non-car Classifier Feature extraction

Training examples

Training: 1. Obtain training data 2. Define features 3. Define classifier Given new image: 1. Slide window 2. Score by classifier

Kristen Grauman

Window-based object detection:

slide-25
SLIDE 25

Classifcation2 CS 4495 Computer Vision – A. Bobick

Discriminative classification methods

Discriminative classifiers – find a division (surface) in feature space that separates the classes Several methods

  • Nearest neighbors
  • Boosting
  • Support Vector Machines
slide-26
SLIDE 26

Classifcation2 CS 4495 Computer Vision – A. Bobick

Discriminative classification methods

Discriminative classifiers – find a division (surface) in feature space that separates the classes Several methods

  • Nearest neighbors
  • Boosting
  • Support Vector Machines
slide-27
SLIDE 27

Classifcation2 CS 4495 Computer Vision – A. Bobick

Nearest Neighbor classification

  • Assign label of nearest training data point to each test

data point

Voronoi partitioning of feature space for 2-category 2D data

from Duda et al.

Black = negative Red = positive Novel test example Closest to a positive example from the training set, so classify it as positive.

slide-28
SLIDE 28

Classifcation2 CS 4495 Computer Vision – A. Bobick

k = 5

Source: D. Lowe

K-Nearest Neighbors classification

  • For a new point, find the k closest points from training

data

  • Labels of the k points “vote” to classify

If query lands here, the 5 NN consist of 3 negatives and 2 positives, so we classify it as negative. Black = negative Red = positive

slide-29
SLIDE 29

Classifcation2 CS 4495 Computer Vision – A. Bobick

Discriminative classification methods

Discriminative classifiers – find a division (surface) in feature space that separates the classes Several methods

  • Nearest neighbors
  • Boosting
  • Support Vector Machines
slide-30
SLIDE 30

Classifcation2 CS 4495 Computer Vision – A. Bobick

Discriminative classification methods

Discriminative classifiers – find a division (surface) in feature space that separates the classes Several methods

  • Nearest neighbors
  • Boosting
  • Support Vector Machines
slide-31
SLIDE 31

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting: Training method

  • Initially, weight each training example equally
  • In each boosting round:
  • Find the weak learner that achieves the

lowest weighted training error

  • Raise weights of training examples

misclassified by current weak learner

  • Compute final classifier as linear combination of

all weak learners (weight of each learner is directly proportional to its accuracy)

Slide credit: Lana Lazebnik

slide-32
SLIDE 32

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting intuition

Weak Classifier 1

Slide credit: Paul Viola

slide-33
SLIDE 33

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting: Training method

  • In each boosting round:
  • Find the weak learner that achieves the

lowest weighted training error

  • Raise weights of training examples

misclassified by current weak learner

Slide credit: Lana Lazebnik

slide-34
SLIDE 34

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting: Training method

  • In each boosting round:
  • Find the weak learner that achieves the

lowest weighted training error

  • Raise weights of training examples

misclassified by current weak learner

Slide credit: Lana Lazebnik

slide-35
SLIDE 35

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting intuition

Weak Classifier 1

Slide credit: Paul Viola

slide-36
SLIDE 36

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting illustration

Weights Increased

slide-37
SLIDE 37

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting illustration

Weak Classifier 2

slide-38
SLIDE 38

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting illustration

Weights Increased

slide-39
SLIDE 39

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting illustration

Weak Classifier 3

slide-40
SLIDE 40

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting illustration

Final classifier is a combination of weak classifiers

slide-41
SLIDE 41

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting: Training method

  • Initially, weight each training example equally
  • In each boosting round:
  • Find the weak learner that achieves the

lowest weighted training error

  • Raise weights of training examples

misclassified by current weak learner

  • Compute final classifier as linear combination of

all weak learners (weight of each learner is directly proportional to its accuracy)

  • Exact formulas for re-weighting and combining

weak learners depend on the particular boosting scheme (e.g., AdaBoost)

Slide credit: Lana Lazebnik

slide-42
SLIDE 42

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones face detector

slide-43
SLIDE 43

Classifcation2 CS 4495 Computer Vision – A. Bobick

Main ideas:

  • Represent local texture with efficiently computable

“rectangular” features within window of interest

  • Select discriminative features to be weak classifiers
  • Use boosted combination of them as final classifier
  • Form a cascade of such classifiers, rejecting clear

negatives quickly

Viola-Jones face detector

Kristen Grauman

slide-44
SLIDE 44

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones detector: features

Feature output is difference between adjacent regions Efficiently computable with integral image: any sum can be computed in constant time. “Rectangular” filters

Value at (x,y) is sum of pixels above and to the left of (x,y)

Integral image

Kristen Grauman

slide-45
SLIDE 45

Classifcation2 CS 4495 Computer Vision – A. Bobick

Computing sum within a rectangle

  • Let A,B,C,D be the values of

the integral image at the corners of a rectangle

  • Then the sum of original image

values within the rectangle can be computed as: sum = A – B – C + D

  • Only 3 additions are required

for any size of rectangle! D B C A

Lana Lazebnik

slide-46
SLIDE 46

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones detector: features

Feature output is difference between adjacent regions Efficiently computable with integral image: any sum can be computed in constant time Avoid scaling images  scale features directly for same cost “Rectangular” filters

Value at (x,y) is sum of pixels above and to the left of (x,y)

Integral image

Kristen Grauman

slide-47
SLIDE 47

Classifcation2 CS 4495 Computer Vision – A. Bobick

Considering all possible filter parameters: position, scale, and type: 180,000+ possible features associated with each 24 x 24 window

Which subset of these features should we use to determine if a window has a face? Use AdaBoost both to select the informative features and to form the classifier

Viola-Jones detector: features

Kristen Grauman

slide-48
SLIDE 48

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones detector: AdaBoost

  • Want to select the single rectangle feature and threshold

that best separates positive (faces) and negative (non- faces) training examples, in terms of weighted error.

Outputs of a possible rectangle feature on faces and non-faces.

… Resulting weak classifier: For next round, reweight the examples according to errors, choose another filter/threshold combo.

Kristen Grauman

slide-49
SLIDE 49

Classifcation2 CS 4495 Computer Vision – A. Bobick

AdaBoost Algorithm

Start with uniform weights

  • n training

examples Evaluate weighted error for each feature, pick best. Re-weight the examples: Incorrectly classified -> more weight Correctly classified -> less weight Final classifier is combination of the weak

  • nes, weighted according to error they

had. Freund & Schapire 1995

{x1,…xn}

For T rounds

slide-50
SLIDE 50

Classifcation2 CS 4495 Computer Vision – A. Bobick

First two features selected

Viola-Jones Face Detector: Results

slide-51
SLIDE 51

Classifcation2 CS 4495 Computer Vision – A. Bobick

Main ideas:

  • Represent local texture with efficiently computable

“rectangular” features within window of interest

  • Select discriminative features to be weak classifiers
  • Use boosted combination of them as final classifier
  • Form a cascade of such classifiers, rejecting clear

negatives quickly

Viola-Jones face detector

Kristen Grauman

slide-52
SLIDE 52

Classifcation2 CS 4495 Computer Vision – A. Bobick

Main ideas:

  • Represent local texture with efficiently computable

“rectangular” features within window of interest

  • Select discriminative features to be weak classifiers
  • Use boosted combination of them as final classifier
  • Form a cascade of such classifiers, rejecting clear

negatives quickly

Viola-Jones face detector

Kristen Grauman

slide-53
SLIDE 53

Classifcation2 CS 4495 Computer Vision – A. Bobick

2nd idea: Cascade…

  • Key insight: almost every where is a non-face.
  • So… detect non-faces more quickly than faces.
  • And if you say it’s not a face, be sure and move on.
slide-54
SLIDE 54

Classifcation2 CS 4495 Computer Vision – A. Bobick

Cascading classifiers for detection

  • Form a cascade with low false negative rates early on
  • Apply less accurate but faster classifiers first to immediately

discard windows that clearly appear to be negative

Kristen Grauman

slide-55
SLIDE 55

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones detector: summary

Train with 5K positives, 350M negatives Real-time detector using 38 layer cascade 6061 features in all layers

[Implementation available in OpenCV: http://www.intel.com/technology/computing/opencv/]

Faces Non-faces

Train cascade of classifiers with AdaBoost

Selected features, thresholds, and weights New image

Kristen Grauman

slide-56
SLIDE 56

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones Face Detector: Results

slide-57
SLIDE 57

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones Face Detector: Results

slide-58
SLIDE 58

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones Face Detector: Results

slide-59
SLIDE 59

Classifcation2 CS 4495 Computer Vision – A. Bobick

Detecting profile faces?

Can we use the same detector?

slide-60
SLIDE 60

Classifcation2 CS 4495 Computer Vision – A. Bobick Paul Viola, ICCV tutorial

Viola-Jones Face Detector: Results

slide-61
SLIDE 61

Classifcation2 CS 4495 Computer Vision – A. Bobick

Everingham, M., Sivic, J. and Zisserman, A. "Hello! My name is... Buffy" - Automatic naming of characters in TV video, BMVC 2006. http:/ / www.robots.ox.ac.uk/ ~vgg/ research/ nface/ index.html

Example using Viola-Jones detector

Frontal faces detected and then tracked, character names inferred with alignment of script and subtitles.

slide-62
SLIDE 62

Classifcation2 CS 4495 Computer Vision – A. Bobick

slide-63
SLIDE 63

Classifcation2 CS 4495 Computer Vision – A. Bobick

Consumer application: iPhoto 2009

http://www.apple.com/ilife/iphoto/

Slide credit: Lana Lazebnik

slide-64
SLIDE 64

Classifcation2 CS 4495 Computer Vision – A. Bobick

Consumer application: iPhoto 2009

  • Things iPhoto thinks are faces

Slide credit: Lana Lazebnik

slide-65
SLIDE 65

Classifcation2 CS 4495 Computer Vision – A. Bobick

Viola-Jones detector: summary

  • A seminal approach to real-time object detection
  • Training is slow, but detection is very fast
  • Key ideas
  • Integral images for fast feature evaluation
  • Boosting for feature selection
  • Attentional cascade of classifiers for fast rejection of non-

face windows

  • P. Viola and M. Jones. Rapid object detection using a boosted cascade of simple features.

CVPR 2001.

  • P. Viola and M. Jones. Robust real-time face detection. IJCV 57(2), 2004.
slide-66
SLIDE 66

Classifcation2 CS 4495 Computer Vision – A. Bobick

Boosting: pros and cons

  • Advantages of boosting
  • Integrates classification with feature selection
  • Complexity of training is linear in the number of training examples
  • Flexibility in the choice of weak learners, boosting scheme
  • Testing is fast
  • Easy to implement
  • Disadvantages
  • Needs many training examples
  • Often found not to work as well as an alternative discriminative

classifier, support vector machine (SVM)

  • especially for many-class problems

Slide credit: Lana Lazebnik

slide-67
SLIDE 67

Classifcation2 CS 4495 Computer Vision – A. Bobick

Discriminative classification methods

Discriminative classifiers – find a division (surface) in feature space that separates the classes Several methods

  • Nearest neighbors
  • Boosting
  • Support Vector Machines
slide-68
SLIDE 68

Classifcation2 CS 4495 Computer Vision – A. Bobick

Linear classifiers

slide-69
SLIDE 69

Classifcation2 CS 4495 Computer Vision – A. Bobick

Lines in R2

= + + b cy ax

      = c a w       = y x x Let

slide-70
SLIDE 70

Classifcation2 CS 4495 Computer Vision – A. Bobick

Lines in R2

= + ⋅ b x w

      = c a w       = y x x

= + + b cy ax

Let

w

slide-71
SLIDE 71

Classifcation2 CS 4495 Computer Vision – A. Bobick

Lines in R2

= + ⋅ b x w

      = c a w       = y x x

= + + b cy ax

Let

w

( )

0, y

x

D

slide-72
SLIDE 72

Classifcation2 CS 4495 Computer Vision – A. Bobick

Lines in R2

= + ⋅ b x w

      = c a w       = y x x

= + + b cy ax

Let

w

( )

0, y

x

D w x w b c a b cy ax D + = + + + =

Τ 2 2

distance from point to line

slide-73
SLIDE 73

Classifcation2 CS 4495 Computer Vision – A. Bobick

Lines in R2

= + ⋅ b x w

      = c a w       = y x x

= + + b cy ax

Let

w

( )

0, y

x

D w x w b c a b cy ax D + = + + + =

Τ 2 2

distance from point to line

slide-74
SLIDE 74

Classifcation2 CS 4495 Computer Vision – A. Bobick

Linear classifiers

  • Find linear function to separate positive and negative

examples : negative : positive < + ⋅ ≥ + ⋅ b b

i i i i

w x x w x x Which line is best?

slide-75
SLIDE 75

Classifcation2 CS 4495 Computer Vision – A. Bobick

Support Vector Machines (SVMs)

  • Discriminative

classifier based on

  • ptimal separating line

(for 2d case)

  • Maximize the margin

between the positive and negative training examples

slide-76
SLIDE 76

Classifcation2 CS 4495 Computer Vision – A. Bobick

Support vector machines

  • Want line that maximizes the margin.

1 : 1) ( negative 1 : 1) ( positive − ≤ + ⋅ − = ≥ + ⋅ = b y b y

i i i i i i

w x x w x x

Margin Support vectors

  • C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and

Knowledge Discovery, 1998

For support, vectors,

1 ± = + ⋅ b

i w

x

slide-77
SLIDE 77

Classifcation2 CS 4495 Computer Vision – A. Bobick

Support vector machines

  • Want line that maximizes the margin.

1 : 1) ( negative 1 : 1) ( positive − ≤ + ⋅ − = ≥ + ⋅ = b y b y

i i i i i i

w x x w x x

Margin M Support vectors For support, vectors,

1 ± = + ⋅ b

i w

x

Distance between point and line:

|| || | | w w x b

i

+ ⋅

w w w 2 1 1 = − − = M

w w x w 1 ± = + b

Τ

For support vectors:

slide-78
SLIDE 78

Classifcation2 CS 4495 Computer Vision – A. Bobick

  • 1. Maximize margin 2/||w||
  • 2. Correctly classify all training data points:
  • 3. Quadratic optimization problem:

4.

Minimize Subject to

Finding the maximum margin line

w wT 2 1

1 : 1) ( negative 1 : 1) ( positive − ≤ + ⋅ − = ≥ + ⋅ = b y b y

i i i i i i

w x x w x x

  • C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

( ) 1

i i

b ⋅ + ≥ y x w

slide-79
SLIDE 79

Classifcation2 CS 4495 Computer Vision – A. Bobick

Finding the maximum margin line

  • Solution:
  • The weights are non-zero only at support vectors.

=

i i i i y x

w α

Support vector learned weight

  • C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

i

α

slide-80
SLIDE 80

Classifcation2 CS 4495 Computer Vision – A. Bobick

Finding the maximum margin line

  • Solution:

(for any support vector)

  • Classification function:

b y b

i i i i

+ ⋅ = + ⋅

x x x w α

=

i i i i y x

w α

  • C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

( )

b x f

i

+ ⋅ = + ⋅ =

x x x w

i i

sign b) ( sign ) ( α

If f(x) < 0, classify as negative, if f(x) > 0, classify as positive

i i

b y = − ⋅ w x Dot product only!

slide-81
SLIDE 81

Classifcation2 CS 4495 Computer Vision – A. Bobick

Questions

  • What if the features are not 2d?
  • What if the data is not linearly separable?
  • What if we have more than just two categories?
slide-82
SLIDE 82

Classifcation2 CS 4495 Computer Vision – A. Bobick

Questions

  • What if the features are not 2d?
  • Generalizes to d-dimensions – replace line with “hyperplane”
  • What if the data is not linearly separable?
  • What if we have more than just two categories?
slide-83
SLIDE 83

Classifcation2 CS 4495 Computer Vision – A. Bobick

Dalal & Triggs, CVPR 2005

  • Map each grid cell in the

input window to a histogram counting the gradients per

  • rientation.
  • Train a linear SVM using

training set of pedestrian vs. non-pedestrian windows.

Code available: http://pascal.inrialpes.fr/soft/olt/

Person detection with HoG’s & linear SVM’s

slide-84
SLIDE 84

Classifcation2 CS 4495 Computer Vision – A. Bobick

Person detection with HoG’s & linear SVM’s

  • Histograms of Oriented Gradients for Human Detection, Navneet Dalal, Bill Triggs,

International Conference on Computer Vision & Pattern Recognition - June 2005

  • http://lear.inrialpes.fr/pubs/2005/DT05/
slide-85
SLIDE 85

Classifcation2 CS 4495 Computer Vision – A. Bobick

Questions

  • What if the features are not 2d?
  • What if the data is not linearly separable?
  • What if we have more than just two categories?
slide-86
SLIDE 86

Classifcation2 CS 4495 Computer Vision – A. Bobick

Non-linear SVMs

  • Datasets that are linearly separable with some noise

work out great:

  • But what are we going to do if the dataset is just too

hard?

  • How about… mapping data to a higher-dimensional

space: x x x x2

slide-87
SLIDE 87

Classifcation2 CS 4495 Computer Vision – A. Bobick

Non-linear SVMs: feature spaces

  • General idea: the original input space can be mapped to

some higher-dimensional feature space where the training set is separable:

Φ: x → φ(x)

Slide from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html

slide-88
SLIDE 88

Classifcation2 CS 4495 Computer Vision – A. Bobick

The “Kernel” Trick

  • The linear classifier relies on dot product between vectors

K(xi,xj)=xi

Txj

  • If every data point is mapped into high-dimensional space

via some transformation Φ: x → φ(x), the dot product becomes: K(xi,xj)= φ(xi) Tφ(xj)

  • A kernel function is similarity function that corresponds to

an inner product in some expanded feature space.

slide-89
SLIDE 89

Classifcation2 CS 4495 Computer Vision – A. Bobick

Example

2-dimensional vectors x=[x1 x2]; let K(xi,xj)=(1 + xi

Txj)2

Need to show that K(xi,xj)= φ(xi) Tφ(xj): K(xi,xj)=(1 + xi

Txj)2 ,

= 1+ xi1

2xj1 2 + 2 xi1xj1 xi2xj2+ xi2 2xj2 2 + 2xi1xj1 + 2xi2xj2

= [1 xi1

2 √2 xi1xi2 xi2 2 √2xi1 √2xi2]T

[1 xj1

2 √2 xj1xj2 xj2 2 √2xj1 √2xj2]

= φ(xi) Tφ(xj), where φ(x) = [1 x1

2 √2 x1x2 x2 2 √2x1 √2x2]

from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html

slide-90
SLIDE 90

Classifcation2 CS 4495 Computer Vision – A. Bobick

Nonlinear SVMs

  • The kernel trick: instead of explicitly computing the lifting

transformation φ(x), define a kernel function K such that K(xi,xjj) = φ(xi ) · φ(xj)

  • This gives a nonlinear decision boundary in the original

feature space:

( ) ( , )

T i i i i i i i i

y b y K b α α + → +

∑ ∑

x x x x

slide-91
SLIDE 91

Classifcation2 CS 4495 Computer Vision – A. Bobick

Examples of kernel functions

Linear:

Gaussian RBF: Histogram intersection:

) 2 exp( ) (

2 2

σ

j i j i

x x ,x x K − − =

=

k j i j i

k x k x x x K )) ( ), ( min( ) , (

j T i j i

x x x x K = ) , (

2 2 2 2 2 2

1 ( ) 1 1 exp || || exp || || exp || || 2 ! 2 2

j

j

j

=

′       ′ ∞ ′ − − = − − ∑             x x x x x x

slide-92
SLIDE 92

Classifcation2 CS 4495 Computer Vision – A. Bobick

SVMs for recognition

  • 1. Define your representation for each

example.

  • 2. Select a kernel function.
  • 3. Compute pairwise kernel values

between labeled examples

  • 4. Use this “kernel matrix” to solve for

SVM support vectors & weights.

  • 5. To classify a new example: compute

kernel values between new input and support vectors, apply weights, check sign of output.

slide-93
SLIDE 93

Classifcation2 CS 4495 Computer Vision – A. Bobick

Example: learning gender with SVMs

Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002. Moghaddam and Yang, Face & Gesture 2000.

slide-94
SLIDE 94

Classifcation2 CS 4495 Computer Vision – A. Bobick

Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

Processed faces Face alignment processing

slide-95
SLIDE 95

Classifcation2 CS 4495 Computer Vision – A. Bobick

  • Training examples:
  • 1044 males
  • 713 females
  • Experiment with various kernels, select Gaussian RBF

Learning gender with SVMs

) 2 exp( ) , (

2 2

σ

j i j i

x x x x − − = K

slide-96
SLIDE 96

Classifcation2 CS 4495 Computer Vision – A. Bobick

Support Faces

Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

slide-97
SLIDE 97

Classifcation2 CS 4495 Computer Vision – A. Bobick

Classifier Performance

Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

slide-98
SLIDE 98

Classifcation2 CS 4495 Computer Vision – A. Bobick

Gender perception experiment: How well can humans do?

  • Subjects:
  • 30 people (22 male, 8 female)
  • Ages mid-20’s to mid-40’s
  • Test data:
  • 254 face images (60% males, 40% females)
  • Low res and high res versions
  • Task:
  • Classify as male or female, forced choice
  • No time limit

Moghaddam and Yang, Face & Gesture 2000.

slide-99
SLIDE 99

Classifcation2 CS 4495 Computer Vision – A. Bobick

Human Performance

Moghaddam and Yang, Face & Gesture 2000.

Error Error

slide-100
SLIDE 100

Classifcation2 CS 4495 Computer Vision – A. Bobick

Careful how you do things?

slide-101
SLIDE 101

Classifcation2 CS 4495 Computer Vision – A. Bobick

Careful how you do things?

slide-102
SLIDE 102

Classifcation2 CS 4495 Computer Vision – A. Bobick

Human vs. Machine

  • SVMs performed

better than any single human test subject, at either resolution

slide-103
SLIDE 103

Classifcation2 CS 4495 Computer Vision – A. Bobick

Hardest examples for humans

Moghaddam and Yang, Face & Gesture 2000.

slide-104
SLIDE 104

Classifcation2 CS 4495 Computer Vision – A. Bobick

Questions

  • What if the features are not 2d?
  • What if the data is not linearly separable?
  • What if we have more than just two categories?
slide-105
SLIDE 105

Classifcation2 CS 4495 Computer Vision – A. Bobick

Multi-class SVMs

  • Achieve multi-class classifier by combining a number of

binary classifiers

  • One vs. all
  • Training: learn an SVM for each class vs. the rest
  • Testing: apply each SVM to test example and assign to

it the class of the SVM that returns the highest decision value

  • One vs. one
  • Training: learn an SVM for each pair of classes
  • Testing: each learned SVM “votes” for a class to assign

to the test example

slide-106
SLIDE 106

Classifcation2 CS 4495 Computer Vision – A. Bobick

SVMs: Pros and cons

  • Pros
  • Many publicly available SVM packages:

http://www.kernel-machines.org/software

  • http://www.csie.ntu.edu.tw/~cjlin/libsvm/
  • Kernel-based framework is very powerful, flexible
  • Often a sparse set of support vectors – compact at test time
  • Work very well in practice, even with very small training sample

sizes

  • Cons
  • No “direct” multi-class SVM, must combine two-class SVMs
  • Can be tricky to select best kernel function for a problem
  • Computation, memory
  • During training time, must compute matrix of kernel values for every

pair of examples

  • Learning can take a very long time for large-scale problems

Adapted from Lana Lazebnik

slide-107
SLIDE 107

Classifcation2 CS 4495 Computer Vision – A. Bobick

Window-based detection: strengths

  • Sliding window detection and global appearance

descriptors:

  • Simple detection protocol to implement
  • Good feature choices critical
  • Past successes for certain classes

Kristen Grauman

slide-108
SLIDE 108

Classifcation2 CS 4495 Computer Vision – A. Bobick

Window-based detection: Limitations

  • High computational complexity
  • For example: 250,000 locations x 30 orientations x 4 scales =

30,000,000 evaluations!

  • If training binary detectors independently, means cost increases linearly

with number of classes

  • With so many windows, false positive rate better be low

Kristen Grauman

slide-109
SLIDE 109

Classifcation2 CS 4495 Computer Vision – A. Bobick

Limitations (continued)

  • Not all objects are “box” shaped

Kristen Grauman

slide-110
SLIDE 110

Classifcation2 CS 4495 Computer Vision – A. Bobick

Limitations (continued)

  • Non-rigid, deformable objects not captured well with

representations assuming a fixed 2d structure; or must assume fixed viewpoint

  • Objects with less-regular textures not captured well with

holistic appearance-based descriptions

Kristen Grauman

slide-111
SLIDE 111

Classifcation2 CS 4495 Computer Vision – A. Bobick

Limitations (continued)

  • If considering windows in isolation, context is lost

Figure credit: Derek Hoiem

Sliding window Detector’s view

Kristen Grauman

slide-112
SLIDE 112

Classifcation2 CS 4495 Computer Vision – A. Bobick

Limitations (continued)

  • In practice, often entails large, cropped training set

(expensive)

  • Requiring good match to a global appearance description

can lead to sensitivity to partial occlusions

Image credit: Adam, Rivlin, & Shimshoni

Kristen Grauman

slide-113
SLIDE 113

Classifcation2 CS 4495 Computer Vision – A. Bobick

Summary

  • Basic pipeline for window-based detection
  • Model/representation/classifier choice
  • Sliding window and classifier scoring
  • Boosting classifiers: general idea
  • Viola-Jones face detector
  • Exemplar of basic paradigm
  • Plus key ideas: rectangular features, Adaboost for

feature selection, cascade

  • Pros and cons of window-based detection