Lecture 18: Recognition IV Thursday, Nov 15 Prof. Kristen Grauman

Outline • Discriminative classifiers – SVMs • Learning categories from weakly supervised images – Constellation model • Shape matching – Shape context, visual CAPTCHA application

Recall: boosting • Want to select the single feature that best separates positive and negative examples, in terms of weighted error. Each dimension: output of a possible rectangle feature on faces and non-faces.

Recall: boosting • Want to select the single feature that best separates positive and negative examples, in terms of weighted error. = Each dimension: output of a possible rectangle feature on faces and non-faces.

Recall: boosting • Want to select the single feature that best separates positive and negative examples, in terms of weighted error. Optimal threshold that results = in minimal misclassifications Image subwindow Each dimension: output of a Notice that any threshold giving same error possible rectangle feature rate would be equally good here. on faces and non-faces.

Lines in R 2 + + = ax by d 0

Lines in R 2 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let ⎣ ⎦ ⎣ ⎦ + + = ax by d 0

Lines in R 2 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let ⎣ ⎦ ⎣ ⎦ w + + = ax by d 0 ⋅ + = d w x 0

Lines in R 2 ( ) x 0 , y 0 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let D ⎣ ⎦ ⎣ ⎦ w + + = ax by d 0 ⋅ + = d w x 0

Lines in R 2 ( ) x 0 , y 0 ⎡ ⎤ ⎡ ⎤ a x = b = y w ⎢ ⎥ x ⎢ ⎥ Let D ⎣ ⎦ ⎣ ⎦ w + + = ax by d 0 ⋅ + = d w x 0 + + Τ + ax by d d w x distance from = = D 0 0 point to line + a b w 2 2

Planes in R 3 ⎡ ⎤ ⎡ ⎤ a x ( ) ⎢ ⎥ ⎢ ⎥ x y z , , = = b y w x Let ⎢ ⎥ ⎢ ⎥ 0 0 0 ⎢ ⎥ ⎢ ⎥ w c z ⎣ ⎦ ⎣ ⎦ D + + + = ax by cz d 0 ⋅ + = d w x 0 + + + Τ + ax by cz d d w x distance from = = D 0 0 0 point to plane + + a b c w 2 2 2

Hyperplanes in R n ∈ n R x Hyperplane H is set of all vectors which satisfy: + + + + = w x w x w x b K 0 n n 1 1 2 2 Τ + = b w x 0 distance from Τ + b w x = D H point to ( , x ) w hyperplane

Support Vector Machines (SVMs) • Discriminative classifier based on optimal separating hyperplane • What hyperplane is optimal?

Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 w x + b>0 denotes -1 w x + b=0 How would you classify this data? w x + b<0 Slides from Andrew Moore’s tutorial: http://www.autonlab.org/tutorials/svm.html

Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 denotes -1 How would you classify this data?

Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 denotes -1 Any of these would be fine.. ..but which is best?

Linear Classifiers x f y est f ( x , w ,b) = sign( w x + b) denotes + 1 denotes -1 How would you classify this data? Misclassified to +1 class

Classifier Margin Classifier Margin x x f f y est y est f ( x , w ,b) = sign( w x + b) f ( x , w ,b) = sign( w x + b) denotes + 1 denotes + 1 Define the margin Define the margin denotes -1 denotes -1 of a linear of a linear classifier as the classifier as the width that the width that the boundary could be boundary could be increased by increased by before hitting a before hitting a datapoint. datapoint.

Maximum Margin x f y est 1. Maximizing the margin is good according to intuition and theory f ( x , w ,b) = sign( w x + b) denotes + 1 2. Implies that only support vectors are important; other training examples The maximum denotes -1 are ignorable. margin linear 3. Empirically it works very very well. classifier is the linear classifier Support Vectors with maximum are those margin. datapoints that the margin This is the pushes up simplest kind of against SVM (Called an LSVM) Linear SVM

Linear SVM Mathematically ” 1 + = x + s M = Margin Width s a l C e t n c o i d z e r P “ X - ” 1 - wx+ b= 1 = s s a l C wx+ b= 0 e t c n i o d z e wx+ b= -1 r P “ For the support vectors, distance to hyperplane is 1 for a positives and -1 for negatives. − + b ± Τ 1 1 2 w x 1 = − = = M w w w w w

Question • How should we choose values for w,b? 1.want the training data separated by the hyperplane so it classifies them correctly 2.want the margin width M as large as possible

Linear SVM Mathematically � Goal: 1) Correctly classify all training data + b ≥ wx i 1 if y i = +1 + b ≤ wx i if y i = -1 1 + b ≥ y wx for all i ( ) 1 i i 2 = M 2) Maximize the Margin w same as minimize 1 w t w 2 Formulated as a Quadratic Optimization Problem, solve for w and b: � 1 Φ = t w w w ( ) � Minimize 2 + b ≥ y wx ∀ i ( ) 1 subject to i i

The Optimization Problem Solution � Solution has the form (omitting derivation): w = Σ α i y i x i b = y k - w T x k for any x k such that α k ≠ 0 � Each non-zero α i indicates that corresponding x i is a support vector. � Then the classifying function will have the form: f ( x ) = Σ α i y i x i T x + b � Notice that it relies on an inner product between the test point x and the support vectors x i � Solving the optimization problem also involves computing the inner products x i T x j between all pairs of training points.

Non-linear SVMs � Datasets that are linearly separable with some noise work out great: x 0 � But what are we going to do if the dataset is just too hard? x 0 � How about … mapping data to a higher-dimensional space: x 2 0 x

Non-linear SVMs: Feature spaces � General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable: Φ : x → φ ( x )

The “ Kernel Trick ” The linear classifier relies on dot product between vectors K (x i ,x j )=x i T x j � If every data point is mapped into high-dimensional space via some � transformation Φ : x → φ (x), the dot product becomes: K (x i ,x j )= φ (x i ) T φ (x j ) A kernel function is similarity function that corresponds to an inner product � in some expanded feature space. Example: � 2-dimensional vectors x=[ x 1 x 2 ]; let K (x i ,x j )=(1 + x i T x j ) 2 Need to show that K (x i ,x j )= φ (x i ) T φ (x j ): K (x i ,x j )=(1 + x i T x j ) 2 , = 1+ x i1 2 x j1 2 + 2 x i1 x j1 x i2 x j2 + x i2 2 x j2 2 + 2 x i1 x j1 + 2 x i2 x j2 2 √ 2 x i1 x i2 x i2 2 √ 2 x i1 √ 2 x i2 ] T [1 x j1 2 √ 2 x j1 x j2 x j2 2 √ 2 x j1 √ 2 x j2 ] = [1 x i1 = φ (x i ) T φ (x j ), where φ (x) = [1 x 1 2 √ 2 x 1 x 2 x 2 2 √ 2 x 1 √ 2 x 2 ]

Examples of General Purpose Kernel Functions � Linear: K ( x i , x j )= x i T x j � Polynomial of power p : K ( x i , x j )= (1+ x i T x j ) p � Gaussian (radial-basis function network): 2 − x x = − i j K ( x , x ) exp( ) σ i j 2 2

SVMs for object recognition 1. Define your representation for each example. 2. Select a kernel function. 3. Compute pairwise kernel values between labeled examples, identify support vectors. 4. Compute kernel values between new inputs and support vectors to classify.

Example: learning gender with SVMs Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002. Moghaddam and Yang, Face & Gesture 2000.

Face alignment processing Processed faces Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

Learning gender with SVMs • Training examples: – 1044 males – 713 females • Experiment with various kernels, select Gaussian RBF

Support Faces Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

Gender perception experiment: How well can humans do? • Subjects: – 30 people (22 male, 8 female) – Ages mid-20’s to mid-40’s • Test data: – 254 face images (6 males, 4 females) – Low res and high res versions • Task: – Classify as male or female, forced choice – No time limit Moghaddam and Yang, Face & Gesture 2000.

Gender perception experiment: How well can humans do? Error Error Moghaddam and Yang, Face & Gesture 2000.

Human vs. Machine • SVMs perform better than any single human text subject

Hardest examples for humans Moghaddam and Yang, Face & Gesture 2000.

Summary: SVM classifiers • Discriminative classifier • Effective for high-dimesional data • Flexibility/modularity due to kernel • Very good performance in practice, widely used in vision applications

Lecture 18: Recognition IV Thursday, Nov 15 Prof. Kristen Grauman - PDF document

Lecture 18: Recognition IV Thursday, Nov 15 Prof. Kristen Grauman Outline Discriminative classifiers SVMs Learning categories from weakly supervised images Constellation model Shape matching Shape context, visual

A summary of deep models for face recognition Qianli Liao Face recognition Face recognition:

8-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches

EMPLOYEE RECOGNITION OBJECTIVES Types of recognition Creating a culture of recognition

License Plate Recognition License Plate Recognition License Plate Recognition License Plate

Instance-level Recognition Pingmei Xu Object Recognition Friends SE01EP02 Recognition: Find the

Face detection and recognition Detection Recognition Sally Face detection &

Speech recognition Brief history Technology Computer Literacy 1 Lecture 22 How does

Automatic Speech Recognition (CS753) Automatic Speech Recognition (CS753) Lecture 1: Introduction

Donor Recognition NPS ~ Donor Recognition Donor recognition is an important and critical for

Learning for Action Recognition Yemin Shi shiyemin@pku.edu.cn 2018-03 1 Background Action

Speaker Recognition and Speaker Recognition and the ETSI Standard the ETSI Standard Distributed

Part 5 pattern recognition pattern recognition track pattern recognition: associate hits

Action recognition in videos Action recognition in videos Cordelia Schmid Cordelia Schmid

Feature Selection Pattern Recognition: The Early Days Pattern Recognition: The Early Days Only

HMMS and Speech HMMS and Speech HMMS and Speech Recognition Recognition Recognition Presented

Action recognition in videos II Action recognition in videos II Cordelia Schmid INRIA Grenoble

Complex learning example: curve fitting t = sin(2 x ) + noise t n t 1 y ( x n , w ) 0 1

ANGLUIN'S ALGORITHM FOR LEARNING REGULAR SETS Ullas Aparanji DISCLAIMER The speaker takes no

Romans Series Lesson #29 August 11, 2011 Dean Bible Ministries www.deanbible.org Dr. Robert L.

Opposition Politics and Urban Service Delivery in Kampala, Uganda Gina M. S. Lambright

APTA Human Resources Committee 2014 Webinar Series Implementing Transits New National

The Estates General One vote per estate Clergy and nobility usually joined together to

in the NHS Estate How can we work together to better occupy and use NHS buildings? Julie

Meeting 31: 17 August 2017 Karakia 2 Karakia Ko te tumanako Kia pai tenei r Kia tutuki i