Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA - PDF document

Machine Learning & Pattern Recognition Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1

Linear Discriminant Analysis • If we have samples corresponding to two or more classes, we prefer to select those features that best discriminate between classes – rather than those that best describe the data. • This will, of course, depend on the classifier. • Assume our classifier is Bayes. • Thus, we want to minimize the probability of error. • We will develop a method based on scatter matrices. Theorem • Let the samples of two classes be Normally distributed in R p , with common covariance matrix. Then, the Bayes errors in the p -dimensional space and in the one-dimensional subspace given by v = S - 1 ( µ 1 - µ 2 )/|| S - 1 ( µ 1 - µ 2 ) ||, are the same; where || x || is the Euclidean norm of the vector x . • That is, there is no loss in classification when reducing from p dimensions to one . 1 n 1 m å å = = error l ( y , f ( x )) ; error l ( z , f ( t )) . training i i testing i i n n i = 1 i = 1 PCA LDA 1 n å = - T 2 error ( y v x v ) training i i n = i 1 2

LDA LDA Fisherfaces (LDA) 3

Scatter matrices and separability criteria • Within-class scatter matrix: ( )( ) . C N åå = - µ - µ T S x x W ij j ij j = = j 1 i 1 • Between-class scatter matrix: ( )( ) . C å T = µ - µ µ - µ S B j j = j 1 ˆ S = + • Note that: S S . W B • To formulate criteria for class separability, we need to convert these matrices to numerical values: - 1 tr ( S S ) 2 1 - ln | S | ln | S | 1 2 tr S 1 tr S 2 • Typical combination of scatter matrices are: ˆ ˆ = S S { S , S } { S , S }, { S , }, and { S , }. 1 2 B W B W A solution to LDA • Again, we want to minimize the Bayes error. • Therefore, we want the projection from Y to X that minimizes the error: p n å å ˆ = f + f X ( p ) y b . i i i i = = + i 1 i p 1 • The eigenvalue decomposition is the optimal ( ) transformation: - 1 f = l f S W S . B i i i Simultaneous diagonalization. 4

Ronald Fisher (1890-1962) • Fisher was an eminent scholar and one of the great scientist of the first part of the 20 th century. After graduating from Cambridge and being denied entry to the British army for his poor eyesight, he worked as a statistician for six years before starting a farming business. While a farmer, he continued his genetics and statistics research. During this time, he developed the well-known analysis of variance (ANOVA) method. After the war, Fisher finally moved to Rothamsted Experimental Station. Among his many accomplishments, Fisher invented ANOVA, the technique of maximum likelihood (ML), Fisher Information, the concept of sufficiency, and the method now known as Linear Discriminant Analysis(LDA). During World War II, the filed of eugenics suffered a big blow -- mainly do to the Nazis use of it as a justification for some of their actions. Fisher moved back to Rothamsted and then to Cambridge wherehe retired. Fisher has been accredited to be one of the founders of modern statistics and one cannot study pattern recognition without encountering several of his ground-breaking insights. Yet as great as a statistician that he was, he also become a major figure in genetics. A classical quote in the Annals of Statistics reads � I occasionally meet geneticists who ask me whether it is true that the great geneticist R.A. Fisher was also an important statistician." Example : Face Recognition 5

Limitations of LDA • To prevent to become singular, N>d . S W • There are only C-1 nonzero eigenvectors. • • Nonparametric LDA is design to solve the last problem (we � ll see this latter in the course). PCA versus LDA • In many applications the number of samples is relatively small compared to the dimensionality of the data. • Even for simple PDFs, PCA can outperform LDA (testing data). • Again, this limits the number of features one can use. • PCA is usually a guarantee, because all we try to do is to minimize the representation error. underlying but unknown PDFs 6

Problems with Multi-class Eigen-based Algorithms • In general, researchers define algorithms which are optimal in the 2-classes case and then extend this idea (way of thinking) to the multi-class problem. • This may caused problems. • This is the case for eigen-based approaches which use the idea of scatter matrices defined above. = L M V M V . • Let � s define the general case: 1 2 • This is the same as selecting those eigenvectors v that maximize: T v M v 1 T v M v 2 • Note that this can only be achieved if M 1 and M 2 agree. • The existence of solution depends on the angle between the eigenvectors of M 1 and M 2 . v i is the i th basis vector of the solution space. w i are the eigenvectors of M 1 . u i are the eigenvectors of M 2 . 7

How to know? ( ) ( ) r i r i åå åå 2 2 = q = T K cos u w . ij j i i = 1 j = 1 i = 1 j = 1 where r < q and q is the number of eigenvectors of M 1 . • The larger K is, the less probable that the results will be correct. 8

Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA - PDF document

Machine Learning & Pattern Recognition Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear Discriminant Analysis If we have samples corresponding to two or more classes, we prefer to select those

Discriminant Analysis aka. Discriminant Function Analysis Discriminant Analysis (DISCRIM)

SVM-flexible discriminant analysis Huimin Peng November 20, 2014 Outline SVM Nonlinear SVM =

Discriminant Analysis In discriminant analysis, we try to find functions of the data that

Flexible Discriminant Analysis Using Motivation MGLMM Multivariate Mixed Models Discriminant

Local Fisher Discriminant Local Fisher Discriminant Analysis for Supervised Analysis for

Semi-Supervised Local Fisher Semi-Supervised Local Fisher Discriminant Analysis Discriminant

Linear Discriminant Functions Linear Discriminant Functions 5.8, 5.9, 5.11 Jacob Hays Amit

Linear Discrimination Discriminant-Based Classification 1 Linear Discrimination Linearly

Dimensionality Reduction: Linear Discriminant Analysis and Principal Component Analysis CMSC 678

Lecture #13: Discriminant Analysis Data Science 1 CS 109A, STAT 121A, AC 209A, E-109A Pavlos

Lecture 14: Discriminant Analysis CS109A Introduction to Data Science Pavlos Protopapas and Kevin

Introduction to Machine Learning Classification: Discriminant Analysis

Discriminant Analysis James H. Steiger Department of Psychology and Human Development Vanderbilt

Comparisons of discriminant analysis techniques for high- dimensional correlated data Line H.

Discriminant Analysis using Logistic Regression OLS1D XL4E: V0D XL4E : OLS1D V0D XL4E : OLS1D V0D

The Many Flavors of Penalized Linear Discriminant Analysis Daniela M. Witten Assistant Professor

Time Curves: Folding Time to Visualize Patterns of Temporal Evolution in Data Benjamin Bach,

Discourse Structures and Language Technology Bonnie Webber School of Informatics University of

Intelligent vehicles and road transportation systems (ITS) Week 1 : Introduction, context and

Introduction to Haptics CPSC 599.86 / 601.86 Sonny Chan University of Calgary Computer haptics

Agend nda L Lunc nch S h Sli lides Preliminary action steps for furthering cross-disciplinary

WORLD WAR II BUFFET 15 OCTOBER 2020 KEY FACTORS AND ISSUES Dr. Joe Fitzharris Professor

Charting the Life Course NETWORKS To share this webinar on Facebook or Twitter, you can click

Measurement, Mathematics and Information Technology M. Ram Murty, FRSC Queens Research Chair,