Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA - - PDF document
Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA - - PDF document
Machine Learning & Pattern Recognition Discriminant Analysis Aleix M. Martinez aleix@ece.osu.edu PCA Eigenfaces (PCA) 1 Linear Discriminant Analysis If we have samples corresponding to two or more classes, we prefer to select those
2
Linear Discriminant Analysis
- If we have samples corresponding to two or more
classes, we prefer to select those features that best discriminate between classes – rather than those that best describe the data.
- This will, of course, depend on the classifier.
- Assume our classifier is Bayes.
- Thus, we want to minimize the probability of error.
- We will develop a method based on scatter
matrices.
Theorem
- Let the samples of two classes be Normally
distributed in Rp, with common covariance matrix. Then, the Bayes errors in the p-dimensional space and in the one-dimensional subspace given by v =S-1 (µ1 - µ2)/|| S-1 (µ1 - µ2) ||, are the same; where || x || is the Euclidean norm of the vector x.
- That is, there is no loss in classification when
reducing from p dimensions to one.
. )) ( , ( 1 ; )) ( , ( 1
1 1
å å
= =
= =
m i i i testing n i i i training
f z l n error f y l n error t x
PCA LDA
å
=
- =
n i i T i training
y n error
1 2
) ( 1 v x v
3
LDA LDA Fisherfaces (LDA)
4
Scatter matrices and separability criteria
- Within-class scatter matrix:
- Between-class scatter matrix:
- Note that:
( )( ) .
1
å
=
- =
C j T j j B
S µ µ µ µ
( )( ) .
1 1
åå
= =
- =
C j N i T j ij j ij W
x x S µ µ
. ˆ
B W
S S + = S
- To formulate criteria for class separability,
we need to convert these matrices to numerical values:
- Typical combination of scatter matrices are:
2 1 2 1 1 1 2
| | ln | | ln ) ( S tr S tr S S S S tr
- }.
ˆ , { and }, ˆ , { }, , { } , {
2 1
S S =
W B W B
S S S S S S
A solution to LDA
- Again, we want to minimize the Bayes error.
- Therefore, we want the projection from Y to
X that minimizes the error:
- The eigenvalue decomposition is the optimal
transformation:
å å
+ = =
+ =
n p i i i p i i i
b p
1 1
. ) ( ˆ f f y X
( )
.
1 i i i B W S
S f l f =
- Simultaneous diagonalization.
5 Ronald Fisher (1890-1962)
- Fisher was an eminent scholar and one of the great scientist of the first part of
the 20th century. After graduating from Cambridge and being denied entry to the British army for his poor eyesight, he worked as a statistician for six years before starting a farming business. While a farmer, he continued his genetics and statistics research. During this time, he developed the well-known analysis
- f variance (ANOVA) method. After the war, Fisher finally moved to
Rothamsted Experimental Station. Among his many accomplishments, Fisher invented ANOVA, the technique of maximum likelihood (ML), Fisher Information, the concept of sufficiency, and the method now known as Linear Discriminant Analysis(LDA). During World War II, the filed of eugenics suffered a big blow -- mainly do to the Nazis use of it as a justification for some of their actions. Fisher moved back to Rothamsted and then to Cambridge wherehe retired. Fisher has been accredited to be one of the founders of modern statistics and one cannot study pattern recognition without encountering several of his ground-breaking insights. Yet as great as a statistician that he was, he also become a major figure in genetics. A classical quote in the Annals of Statistics reads I occasionally meet geneticists who ask me whether it is true that the great geneticist R.A. Fisher was also an important statistician."
Example: Face Recognition
6
Limitations of LDA
- To prevent to become singular, N>d.
- There are only C-1 nonzero eigenvectors.
- Nonparametric LDA is design to solve the
last problem (well see this latter in the course).
W
S
PCA versus LDA
- In many applications the number of samples is
relatively small compared to the dimensionality of the data.
- Even for simple PDFs, PCA can outperform LDA
(testing data).
- Again, this limits the number of features one can
use.
- PCA is usually a guarantee, because all we try to
do is to minimize the representation error.
underlying but unknown PDFs
7
Problems with Multi-class Eigen-based Algorithms
- In general, researchers define algorithms
which are optimal in the 2-classes case and then extend this idea (way of thinking) to the multi-class problem.
- This may caused problems.
- This is the case for eigen-based approaches
which use the idea of scatter matrices defined above.
- Lets define the general case:
- This is the same as selecting those
eigenvectors v that maximize:
- Note that this can only be achieved if M1
and M2 agree.
- The existence of solution depends on the
angle between the eigenvectors of M1 and M2. .
2 1
L = V M V M
v M v v M v
2 1 T T
vi is the ith basis vector of the solution space. wi are the eigenvectors of M1. ui are the eigenvectors of M2.
8
How to know?
- The larger K is, the less probable that the
results will be correct.
( ) ( )
åå åå
= = = =
= =
r i i j i T j r i i j ij
K
1 1 2 1 1 2