Lecture 13
Principal Component Analysis Brett Bernstein
CDS at NYU
April 25, 2017
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26
Lecture 13 Principal Component Analysis Brett Bernstein CDS at NYU - - PowerPoint PPT Presentation
Lecture 13 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Lecture 13 Initial Question Intro Question Question Let S R n n be symmetric. 1 How
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26
Lecture 13 Initial Question
1 How does trace S relate to the spectral decomposition S = W ΛW T
2 How do you solve w∗ = arg maxw2=1 wTSw? What is wT
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 2 / 26
Lecture 13 Initial Question
1 We use the following useful property of traces: trace AB = trace BA
2 w∗ is an eigenvector with the largest eigenvalue. Then wT
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 3 / 26
Lecture 13 Principal Component Analysis (PCA)
1 Where did the y’s go? 2 Try to find intrinsic structure in unlabeled data. 3 With PCA, we are looking for a low dimensional affine subspace that
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 4 / 26
Lecture 13 Definition of Principal Components
1 Throughout this lecture we will work with centered data. 2 Suppose X ∈ Rn×d is our data matrix. Define
3 Let X ∈ Rn×d be the matrix with x in every row. 4 Define the centered data:
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 5 / 26
Lecture 13 Definition of Principal Components
1 This is also the sample variance of
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 6 / 26
Lecture 13 Definition of Principal Components
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 7 / 26
Lecture 13 Definition of Principal Components
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 8 / 26
Lecture 13 Definition of Principal Components
1 Define the first loading vector w(1) to be the direction giving the
2 Maximizer is not unique, so we choose one.
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 9 / 26
Lecture 13 Definition of Principal Components
1 Define the kth loading vector w(k) to be the direction giving the
2 The complete set of loading vectors w(1), . . . , w(d) form an
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 10 / 26
Lecture 13 Definition of Principal Components
1 Let W denote the matrix with the kth loading vector w(k) as its kth
2 Then W T ˜
3
4 If we compute the singular value decomposition (SVD) of ˜
5 Then ˜
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 11 / 26
Lecture 13 Computing Principal Components
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 12 / 26
Lecture 13 Computing Principal Components
1 This shows
2 By the introductory problem this implies w(1) is the eigenvector
3 We also learn that the variance along w(1) is λ1, the largest
4 With a bit more work we can see that w(k) is the eigenvector
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 13 / 26
Lecture 13 Computing Principal Components
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 14 / 26
Lecture 13 Computing Principal Components
1 Describe (vaguely) what you expect the sample covariance matrix to
2 What do you think w(1) and w(2) are? Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 15 / 26
Lecture 13 Computing Principal Components
1 We expect tester 2 to have a larger variance than tester 1, and to be
2 We have
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 16 / 26
Lecture 13 Computing Principal Components
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 17 / 26
Lecture 13 Computing Principal Components
1 In our height example above, we can replace our two features with
2 This can be used as a preprocessing step in a supervised learning
3 When performing dimensionality reduction, one must choose how
4 Often people look for an “elbow” in the scree plot: a point where the
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 18 / 26
Lecture 13 Computing Principal Components
1From Jolliffe, Principal Component Analysis Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 19 / 26
Lecture 13 Computing Principal Components
1 Visualization: If we have high dimensional data, it can be hard to plot
1https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2735096/ Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 20 / 26
Lecture 13 Computing Principal Components
1 Want to build a linear model with a dataset
2 We can choose some k and replace each ˜
3 This is called principal component regression, and can be thought of
4 Correlated features may be grouped together into a single principal
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 21 / 26
Lecture 13 Other Comments About PCA
1 What happens if you scale one of the features by a huge factor? 2 It will have a huge variance and become a dominant part of the first
3 To add scale-invariance to the process, people often standardize their
4 This is the same as using the correlation matrix in place of the
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 22 / 26
Lecture 13 Other Comments About PCA
1 What happens if you scale one of the features by a huge factor? 2 It will have a huge variance and become a dominant part of the first
3 To add scale-invariance to the process, people often standardize their
4 This is the same as using the correlation matrix in place of the
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 22 / 26
Lecture 13 Other Comments About PCA
1 One measure of how dispersed our data is the following:
2 A little algebra shows this is trace S, where S is the sample
3 If we project onto the first k principal components, the resulting data
4 We can choose k to account for a desired percentage of ∆. 5 The subspace spanned by the first k loading vectors maximizes the
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 23 / 26
Lecture 13 Other Comments About PCA
1 The k-dimensional subspace V spanned by w(1), . . . , w(k) best fits the
2 Converting your data into principal components can sometimes hurt
3 The smallest principal components, if they correspond to small
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 24 / 26
Lecture 13 Other Comments About PCA
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 25 / 26
Lecture 13 Other Comments About PCA
1 In general, can deal with non-linear by adding features or using
2 Using kernels results in the technique called Kernel PCA. 3 Below we added the feature ˜
Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 26 / 26