Lecture 10: Linear Discriminant Functions (2) Dr. Chengjiang Long - PowerPoint PPT Presentation

Lecture 10: Linear Discriminant Functions (2) Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu

Recap Previous Lecture 2 C. Long Lecture 10 February 17, 2018

Outline Perceptron Rule • Minimum Squared-Error Procedure • Ho - Kashyap Procedure • 3 C. Long Lecture 10 February 17, 2018

"Dual" Problem Classification rule: If α t y i >0 assign y i to ω 1 else if α t y i <0 assign y i to ω 2 Seek a hyperplane that Seek a hyperplane that puts separates patterns from normalized patterns on the different categories same ( positive ) side 5 C. Long Lecture 10 February 17, 2018

Perceptron rule Use Gradient Descent assuming that the error function to be • minimized is : å = - t J ( ) α ( α y ) p Î y Y ( ) α the set of samples misclassified by α . If Y ( α ) is empty , J p ( α ) = 0 ; otherwise , J p ( α ) ≥ 0 . • J p ( α ) is || α || times the sum of distances of misclassified . • J p ( α ) is is piecewise linear and thus suitable for gradient descent . • 6 C. Long Lecture 10 February 17, 2018

Perceptron Batch Rule The gradient of J p ( α ) is : • å = - t å J ( ) α ( α y ) Ñ = - J ( y ) p p Î y Y ( ) α Î y Y ( ) α It is not possible to solve analytically 0. • The perceptron update rule is obtained using gradient • descent : å + = + h α ( k 1) α ( ) k ( ) k y Î y Y ( ) α It is called batch rule because it is based on all misclassified • examples 7 C. Long Lecture 10 February 17, 2018

Perceptron Single Sample Rule The gradient decent single sample rule for J p ( a ) is : • – Note that y M is one sample misclassified by – Must have a consistent way of visiting samples Geometric Interpretation : • – Note that y M is one sample misclassified by – yM is on the wrong side of decision hyperplane – Adding ηy M to a moves the new decision hyperplane in the right direction with respect to y M 8 C. Long Lecture 10 February 17, 2018

Perceptron Single Sample Rule 9 C. Long Lecture 10 February 17, 2018

Perceptron Example Class 1: students who get A • Class 2: students who get F • 10 C. Long Lecture 10 February 17, 2018

Perceptron Example Augment samples by adding an extra feature (dimension) • equal to 1 11 C. Long Lecture 10 February 17, 2018

Perceptron Example Normalize : • 12 C. Long Lecture 10 February 17, 2018

Perceptron Example Single Sample Rule : • 13 C. Long Lecture 10 February 17, 2018

Perceptron Example Set equal initial weights • Visit all samples sequentially , modifying the weights • after each misclassified example New weights • 14 C. Long Lecture 10 February 17, 2018

Perceptron Example New weights • 15 C. Long Lecture 10 February 17, 2018

Perceptron Example New weights • 16 C. Long Lecture 10 February 17, 2018

Perceptron Example Thus the discriminant function is : • Converting back to the original features x : • 17 C. Long Lecture 10 February 17, 2018

Perceptron Example Converting back to the original features x : • This is just one possible solution vector . • If we started with weights , the • solution would be [-1,1.5, -0.5, -1, -1] In this solution , being tall is the least important feature • 18 C. Long Lecture 10 February 17, 2018

LDF: Non-separable Example Suppose we have 2 features and the samples are : • – Class 1: [ 2,1 ] , [ 4,3 ] , [ 3,5 ] – Class 2: [ 1,3 ] and [ 5,6 ] These samples are not separable by a line • Still would like to get approximate separation by a line • – A good choice is shown in green – Some samples may be “noisy” , and we could accept them being misclassified 19 C. Long Lecture 10 February 17, 2018

LDF: Non-separable Example Obtain y 1, y 2, y 3, y 4 by adding extra feature and • “normalizing” 20 C. Long Lecture 10 February 17, 2018

LDF: Non-separable Example Apply Perceptron single sample algorithm • Initial equal weights • Fixed learning rate • 21 C. Long Lecture 10 February 17, 2018

LDF: Non-separable Example 22 C. Long Lecture 10 February 17, 2018

LDF: Non-separable Example We can continue this forever . • There is no solution vector a satisfying for all x i • Need to stop but at a good point • Will not converge in the nonseparable • case To ensure convergence can set • However we are not guaranteed that • we will stop at a good point 26 C. Long Lecture 10 February 17, 2018

Convergence of Perceptron Rules If classes are linearly separable and we use fixed learning • rate , that is for η ( k ) = const Then , both the single sample and batch perceptron rules • converge to a correct solution ( could be any a in the solution space ) If classes are not linearly separable : • – The algorithm does not stop , it keeps looking for a solution which does not exist – By choosing appropriate learning rate , we can always ensure convergence : – For example inverse linear learning rate : – For inverse linear learning rate , convergence in the linearly separable case can also be proven – No guarantee that we stopped at a good point , but there are good reasons to choose inverse linear learning rate 27 C. Long Lecture 10 February 17, 2018

Perceptron Rule and Gradient decent Linearly separable data • - perceptron rule with gradient decent works well Linearly non - separable data • - need to stop perceptron rule algorithm at a good point , this maybe tricky 28 C. Long Lecture 10 February 17, 2018

Minimum Squared-Error Procedures Idea : convert to easier and better understood problem • MSE procedure • – Choose positive constants b 1 , b 2 , … , b n – Try to find weight vector a such that at y i = b i for all samples y i – If we can find such a vector , then a is a solution because the bi’s are positive – Consider all the samples ( not just the misclassified ones ) 30 C. Long Lecture 10 February 17, 2018

MSE Margins If , y i must be at distance b i from the separating • hyperplane ( normalized by ||a|| ) Thus b 1 , b 2 , … , b n give relative expected distances or • “margins” of samples from the hyperplane Should make b i small if sample i is expected to be near • separating hyperplane , and large otherwise In the absence of any additional information , set b 1 = b 2 • = … = b n = 1 31 C. Long Lecture 10 February 17, 2018

MSE Matrix Notation Need to solve n equations • In matrix form Ya = b • 32 C. Long Lecture 10 February 17, 2018

Exact Solution is Rare Need to solve a linear system Ya = b • – Y is an n × ( d +1) matrix Exact solution only if Y is non - singular and square • ( the inverse exists ) – a = b – ( number of samples ) = ( number of features + 1) – Almost never happens in practice – Guaranteed to find the separating hyperplane 33 C. Long Lecture 10 February 17, 2018

Approximate Solution Typically Y is overdetermined , that is it has more rows • ( examples ) than columns ( features ) – If it has more features than examples , should reduce dimensionality Need Ya = b , but no exact solution exists for an • overdetermined system of equations – More equations than unknowns Find an approximate solution • – Note that approximate solution a does not necessarily give the separating hyperplane in the separable case – But the hyperplane corresponding to a may still be a good solution , especially if there is no separating hyperplane 34 C. Long Lecture 10 February 17, 2018

MSE Criterion Function Minimum squared error approach : find a which • minimizes the length of the error vector e Thus minimize the minimum squared error criterion • function : Unlike the perceptron criterion function , we can • optimize the minimum squared error criterion function analytically by setting the gradient to 0 35 C. Long Lecture 10 February 17, 2018

Computing the Gradient 36 C. Long Lecture 10 February 17, 2018

Pseudo-Inverse Solution Setting the gradient to 0: • The matrix is square ( it has d +1 rows and • columns ) and it is often non - singular If is non - singular , its inverse exists and we can • solve for a uniquely : 37 C. Long Lecture 10 February 17, 2018

Lecture 10: Linear Discriminant Functions (2) Dr. Chengjiang Long - PowerPoint PPT Presentation

Lecture 10: Linear Discriminant Functions (2) Dr. Chengjiang Long Computer Vision Researcher at Kitware Inc. Adjunct Professor at RPI. Email: longc3@rpi.edu Recap Previous Lecture 2 C. Long Lecture 10 February 17, 2018 Outline Perceptron

Linear Discriminant Functions Linear Discriminant Functions 5.8, 5.9, 5.11 Jacob Hays Amit

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Discriminant Analysis aka. Discriminant Function Analysis Discriminant Analysis (DISCRIM)

Discriminant Analysis In discriminant analysis, we try to find functions of the data that

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

Linear discriminant functions Andrea Passerini passerini@disi.unitn.it Machine Learning Linear

Flexible Discriminant Analysis Using Motivation MGLMM Multivariate Mixed Models Discriminant

Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Discriminant

Linear Models for Classification Greg Mori - CMPT 419/726 Bishop PRML Ch. 4 Discriminant

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Local Fisher Discriminant Local Fisher Discriminant Analysis for Supervised Analysis for

Selecting Variables in Two-Group Robust Linear Discriminant Analysis . . . . . Stefan Van

Linear Discriminant Analysis and Logistic Regression Matthieu R. Bloch 1 Linear Discriminant

Linear Discrimination Steven J Zeil Old Dominion Univ. Fall 2010 1 Discriminant-Based

The Many Flavors of Penalized Linear Discriminant Analysis Daniela M. Witten Assistant Professor

Local formality of inversion hyperplane arrangements William Slofstra IQC, University of

1 if w x b 0 + i y = i 1 if w x b 0

The Perceptron CMSC 422 M ARINE C ARPUAT marine@cs.umd.edu Credit: figures by Piyush Rai and Hal

Support Vector Machines 3-18-16 Reading Quiz Q1: Which of these hyperplanes would be selected by

Ac%ve learning x 2 o o Spam + o o o + o o o + o o o o

Inside-Out Polytopes Matthias Beck, San Francisco State University Thomas Zaslavsky, Binghamton

On (rational) Shi tableaux Robin Sulzgruber 78 th S eminaire Lotharingien de Combinatoire March

Emma Crates Prevent Lead, Office of the Independent Anti-Slavery Commissioner 16 June 2020 In